viva la vida: Shocking! AI Agents Have Their Own "Moore's Law", with Capabilities Doubling Every Seven Months, Potentially Disrupting Task Patterns in the Next Five Years

Shocking! AI Agents Have Their Own "Moore's Law", with Capabilities Doubling Every Seven Months, Potentially Disrupting Task Patterns in the Next Five Years

In the current era of rapid technological development, every breakthrough in the field of AI has drawn significant attention. Recently, a report in Nature has dropped a bombshell: A latest discovery by the non-profit research institution METR has revealed that AI Agents also have their own "Moore's Law"!

All along, Moore's Law describes that the number of transistors that can be accommodated on an integrated circuit approximately doubles every two years, and the performance also improves accordingly. Now, the progress speed of AI agents in completing long-term tasks also astonishingly follows a certain pattern - the time span approximately doubles every seven months.

In order to accurately measure the change in the ability of agents to complete tasks automatically, the researchers ingeniously proposed a unique indicator, the "50%-task-completion time horizon". Taking 2019 as an example, assuming that the time required for AI to achieve a 50% task success rate corresponds to 10 minutes of human time, then just 7 months later, the corresponding human task completion time becomes 20 minutes. This means that AI is constantly breaking through itself and is capable of handling more and more tasks that originally took a long time for humans. Its capabilities are continuously and rapidly increasing. By 2024, this growth rate has soared even more. Some of the latest models can double the time span approximately every three months. According to this trend, in about five years, AI will be able to automatically complete many tasks that currently take humans a month to complete. This is undoubtedly a huge impact on the traditional task patterns.

So, how exactly did METR conduct this research? Their research methods are rigorous and meticulous. First of all, in the selection of task suites, they carefully selected three different types of task sets to comprehensively evaluate the capabilities of AI models. There are 97 HCAST tasks, covering diverse fields such as software engineering, machine learning, network security, and general reasoning challenges. The task difficulty varies greatly, ranging from a few minutes to 30 hours; 7 RE-Bench tasks, consisting of seven open machine learning research engineering environments, and each task requires about 8 hours for human experts to complete; 66 SWAA tasks represent individual step operations in the software development process, with durations ranging from 1 second to 30 seconds.

To make the evaluation more scientific and reliable, the team recruited more than 800 professionals from the fields of software engineering, machine learning, and network security to participate in task execution and recorded in detail the time they needed to complete the tasks. These time data became an important standard for measuring task difficulty. Subsequently, the researchers conducted running tests on 13 cutting-edge AI models released from 2019 to 2025, such as the well-known GPT series, o1, Sonnet 3.7, etc., on the constructed task suites and recorded their task completion success rates.

On this basis, the key indicator of "50%-task-completion time horizon" came into play. The reason for choosing a 50% success rate as the benchmark is that it is the most robust against minor changes in the data distribution. As Lawrence Chan, one of the paper's authors, said: "If you choose a very low or very high threshold, then removing or adding just one successful or failed task respectively will have a great impact on your estimated value." By conducting a logistic regression analysis on the success and failure data of AI models on each task, the team calculated the time span of each model, that is, the time for humans to complete the task corresponding to the moment when the model's task completion success rate reaches 50%.

After a series of rigorous operations and analyses, the team drew a graph showing the exponential change in model autonomy over time, thus discovering the astonishing pattern that since 2019, the time span of AI models has been growing exponentially, doubling approximately every seven months. To verify the external validity of this discovery, the team also carried out four carefully designed experiments. Use the data from 2023-2025 for retrospective prediction to verify the consistency of the trend; rate the HCAST and RE-Bench tasks based on 16 "messy" factors to analyze the impact of task messiness on model performance; apply the same method on other SWE-bench Verified datasets and compare the results; test the model performance on internal Pull Requests (PR) tasks and compare it with the human baseline.

These experiments further confirmed the reliability of this trend. For example, in the experiment analyzing the "messiness" of tasks, although the absolute performance of AI models on more messy tasks is lower, the speed of performance improvement is quite stable and is not affected by the degree of task "messiness". The verification on the SWE-bench Verified benchmark also observed a similar exponential growth trend, but due to the problem of annotation time, the doubling time of the time span is shorter.

According to the prediction based on the "Moore's Law for AI Agents", AI may reach a one-month task time span in November 2028; even under a more conservative estimate, this goal may be achieved in February 2031. Of course, the METR team is also soberly aware that there are still some aspects that need to be improved in the research, such as the limitations of the task suite, the imperfection of the evaluation indicators, and the uncertainty of future AI development. But they firmly believe that the growth trend of this indicator, which is 1 to 4 times per year, is undeniable.

Combined with the rapid popularity of the Manus agent in reality, we seem to be able to foresee the explosive development that agents are about to usher in. The "Moore's Law for AI Agents" not only reveals the law of the growth of AI capabilities but also fills us with infinite imagination and anticipation for the future. Perhaps in the near future, AI will replace some of the work done by humans in more fields, bringing unprecedented changes to our lives and work. Let's wait and see and witness the glorious arrival of the era of AI agents!

viva la vida

Sunday, March 23, 2025

Shocking! AI Agents Have Their Own "Moore's Law", with Capabilities Doubling Every Seven Months, Potentially Disrupting Task Patterns in the Next Five Years

No comments:

Post a Comment