ByteDance, which has refused to integrate DeepSeek into Doubao, has made new moves in its self-developed deep thinking functionality.
On March 18, the Doubao app fully launched its deep thinking functionality. Just ten days later, Doubao updated this feature again. The main highlight of this update is the integration of web search capabilities into the thinking process, achieving a "think while searching" approach. In simple terms, Doubao has merged the web search button with the deep thinking button.
After the DeepSeek R1 became a sensation, deep thinking and web search became the new design standards for AI assistant products. So, what specific changes does Doubao's new product design, which merges these two features, bring? After actual testing, it was found that besides directly removing the web search button from the page layout, the more important change is the reconstruction of the large model's reasoning process.
When DeepSeek R1 enables web search, the reasoning process first searches web pages and then thinks based on the content of those pages, usually performing only one round of searching. However, Doubao's deep thinking mode with integrated web search first thinks and then searches relevant web pages based on that thinking, combining the specific content of the web pages to further think, often performing 2-3 rounds of searching.
Li Zhen, who is engaged in large model entrepreneurship in China, metaphorically described it as: "Doubao has essentially turned web search into an agent embedded within the deep thinking functionality." From a technical standpoint, Doubao's deep thinking functionality with embedded web search is similar to OpenAI's Deep Research or Grok 3's DeepSearch functionality. These DeepSearch-type agents are characterized by their ability to control web browsers to obtain real-time information, thus possessing the capability to autonomously execute simple web tasks.
However, in daily life, not all problems require activating the deep thinking agent mode. This update by Doubao has also brought some issues. Due to the forced embedding of web search into the deep thinking process, even simple problems require mechanical multi-round searching, which causes unnecessary waiting time for users. For example, when asking about "today's weather in Beijing," Doubao only provided an answer after four rounds of searching.
Nevertheless, it is commendable that by removing the web search button, Doubao has made new attempts to make AI search for problems and find answers more like humans, and to some extent, has started to compete with DeepSeek for the definition of product design.
Regarding the phenomenon of AI assistant products having more and more functions and buttons, Liu Kai (pseudonym), a product manager at a leading domestic company, attributes it to the manifestation of the "abstraction leakage" principle in the AI field. "Abstraction leakage" usually refers to the unavoidable exposure of underlying details and limitations in software development, where abstraction is supposed to hide implementation details. In AI product design, this is reflected in users being forced to understand different model choices, such as the differences between base models and reasoning models, and the differences between enabling web search and deep thinking functionalities, which is increasingly deviating from the ideal seamless experience.
However, with the iteration of models, this phenomenon is expected to change. From Anthropic's release of the world's first hybrid model Claude 3.7 Sonnet to OpenAI CEO Sam Altman's preview of the unified model GPT-5, the future consensus is moving towards a single model solving all problems. Similarly, a single button solving all user needs may also become the ultimate direction of product evolution.
Looking back at the development history of AI assistant products, before the DeepSeek R1 became a sensation earlier this year, web search had not appeared as an independent button in products. When ChatGPT was released at the end of 2022, it could not perform web searches, and the information provided by the large model was only up to July 2021, without the ability to learn from experience. In March 2023, when Baidu's ERNIE Bot was released, its introduced retrieval-augmented generation (RAG) technology became one of the product's highlights, not only allowing the model to obtain real-time information but also helping to mitigate model hallucinations. The web search button newly designed in DeepSeek's R1 reasoning model further increased the number and richness of web pages obtained by the model based on RAG technology. For example, when querying "today's weather in Beijing," without enabling deep thinking and web search, the base model supported by RAG can usually only retrieve a single-digit number of web page links. However, in deep thinking mode with web search enabled, the number of web page links obtained by the model can surge to dozens. Currently, DeepSeek's model knowledge base is only updated to July 2024, so without enabling web search, R1 will inform the user that it cannot provide real-time weather data and suggest enabling the web search functionality when asked about "today's weather in Beijing."
So, why did large model manufacturers design web search as a separate new button only after entering the reasoning model stage? AI commercialization expert Dr. Ding Kun explained that the primary reason is to control the consumption cost of computational power. The deep thinking functionality itself consumes a lot of computational power, and if web search is performed every time reasoning is done, it will significantly increase the consumption of computational resources. After the R1 became a sensation, NVIDIA founder Jensen Huang emphasized multiple times that reasoning models are consuming more and more computing power, and future reasoning models will consume even more computing resources. Additionally, from a commercialization perspective, separating deep thinking and web search helps create product differentiation and stimulate users to purchase paid versions. This is evident in products from OpenAI, Anthropic, and Grok. For example, although OpenAI opened up reasoning functionality to free ChatGPT users in February, there are limitations on the depth and number of thoughts. Free users can only use the reasoning capabilities of the o3-mini model, and to experience more powerful models like o1-Pro or o3-mini high-performance versions, users need to pay $20 or $200 per month to become OpenAI's Plus or Pro subscribers. Furthermore, from a user experience perspective, separating deep thinking and web search functionalities can balance users' needs for answer generation speed and quality. For problems without time sensitivity, users can choose only deep thinking to obtain answers more quickly.
Doubao's merging of web search and deep thinking also has various considerations. Liu Kai (pseudonym), a product manager at a leading domestic company, analyzed that during the product growth process, the internal focus is on the user funnel, which includes acquisition, activation, retention, and conversion to paying users. In terms of acquiring new users, domestic and foreign AI companies are increasingly valuing "curiosity traffic." As long as the product has some unique and magical features, or even if they are not yet implemented but have cool demo presentations, they can attract users to try. One way to stimulate curiosity traffic is to make small optimizations to the product. Sometimes, a small product change can significantly increase the active user base of an AI product. For example, last year, Kimi stood out by focusing on long texts, and this year, DeepSeek showcased thought chains, both achieving viral spread through product updates driven by curiosity traffic. However, not every product update can successfully attract "curiosity traffic." Earlier this year, Doubao's 1.5 Pro large model was released nearly two days later than DeepSeek R1 and, despite having lower pre-training and reasoning costs than DeepSeek V3, it did not gain much attention due to its inferior model experience. This optimization by Doubao to remove the web search button has not yet produced a significant industry impact in the past week.
Among domestic AI assistant products, Doubao has a more urgent need for scale growth. Before DeepSeek became a sensation, Doubao was the AI assistant application with the most monthly active users in China. After being surpassed by DeepSeek, according to a report by LatePost, during the February All Hands meeting, ByteDance CEO Liang Rubo listed the key goals for 2025, one of which was to strengthen scale effects. Data from QuestMobile shows that as of March 4, the daily active users of DeepSeek and Doubao were 48.85 million and 29.47 million, respectively. Liang Rubo set a new target for Doubao's DAU this year, aiming to exceed 50 million. This means that in the next three quarters, Doubao's daily active user base needs to nearly double. Supporting Doubao's pursuit of a larger user base and the integration of web search into deep thinking functionality, despite the computational power consumption, is ByteDance's abundant GPU resources. Benefiting from the accumulation of chip usage during the recommendation algorithm era, it was reported in 2023 that ByteDance's GPU inventory exceeded 100,000 units. According to the latest foreign media reports, ByteDance's AI computational power procurement budget for 2025 will exceed 90 billion yuan.
After Doubao's attempt to merge functionalities, some large model manufacturers have followed suit. Recently, Baidu launched a new "automatic mode" through a combination of "self-developed + open-source models." In this mode, the large model can automatically recognize user needs and autonomously select appropriate models to generate answers. The product interface no longer displays the web search button and has also hidden the deep thinking button.
Before the functionality merger, there were frequent experiments with model integration in the past month. On March 25, DeepSeek officially announced the V3 version update. The new version V3-0324, although not a reasoning model, has some characteristics of R1. The official technical report shows that V3-0324 and the previous V3 use the same base model but have improved post-training methods and borrowed reinforcement learning techniques from the R1 reasoning model training process. Around the same time, Tencent's new Hunyuan T1 official version reasoning model, while ensuring content refinement and accuracy, for the first time seamlessly applied the hybrid Mamba architecture to ultra-large reasoning models, combining fast and slow thinking to shorten the user's waiting time for generating results. Not only in China but also abroad, large model companies are moving towards unified model integration. Previously, Altman mentioned in discussing the plans for GPT-5 that models and product functionalities are becoming too complex, and OpenAI will achieve unification in the future. The o3 reasoning model will no longer be released separately, and GPT-4.5 will be the last non-chain-of-thought base model. Anthropic went even further, releasing the "world's first hybrid model" Claude 3.7 Sonnet in late February, integrating real-time response (Fast Thinking) and deep thinking (Slow Thinking) into a single architecture. Users no longer need to switch between different models, as the model itself judges whether deep thinking is needed for the current problem.
In the future, as models become unified, Liu Kai believes that the various functionality buttons on model interfaces will also return to simpler designs, making the AI product experience infinitely closer to human interaction. Currently, AI assistant products have many functionality labels, reflecting the reality that large model products have low user penetration rates. "Most users try them out of curiosity and then either forget to use them or don't know what to ask," Liu Kai said. To stimulate user enthusiasm, current AI assistant products often borrow design concepts from search engines, such as pushing pop-up messages to users or setting up news links below the input box for users to click. Additionally, large model technology has not yet entered a mature and stable period, which is another reason for the many functionality labels. Most domestic and foreign large model manufacturers currently design products based on highly non-deterministic systems, leading to a "model-centric rather than application-centric" approach. Li Zhen explained that ChatGPT was initially created to showcase OpenAI's model capabilities, not entirely as a consumer-end mass application. Even Altman recently admitted in an interview that they were operating with the standards of a research lab and did not expect to become a consumer technology company. However, as model education becomes more widespread, more large model manufacturers are beginning to focus on core user experiences, "which is the process of products becoming more human-like step by step," Li Zhen explained.
No comments:
Post a Comment