AI will soon take over projects that will cost people weeks

Artificial intelligence improves rapidly and could soon take over projects that people need for weeks. Expert analyzes show that leading AI models are in progress and could cope with human expertise in a shorter time by 2029.
(Symbolbild/natur.wiki)

AI will soon take over projects that will cost people weeks

The current systems of artificial intelligence (AI) cannot exceed people on long tasks, however, Rapid and could have closed the gap faster than many expected, according to an analysis of leading models 1 .

The non -profit organization Metreley, California, developed almost 170 real tasks in the areas of programming, cyber security, general thinking and mechanical learning and then determined a "human base line" by taking time that needed the experts to complete these tasks.

The team then developed a key figure for evaluating the progress of AI models , which is called the "tasks of time, time horizon". This is the time that programmers typically need to complete the tasks, can do the AI ​​models with a certain success rate.

In a Preprint published this week in Arxiv reports that GPT-2, an early large voice model (LLM), which was published by Openai in 2019, failed in all tasks that human experts cost more than a minute. Claude 3.7 Sonnet, which was released by the US start-up Anthropic in February, completed 50 % of the tasks that people would take 59 minutes.

Overall, the time horizon of the 13 leading AI models has doubled about every seven months since 2019, according to the study. The exponential growth of the AI ​​time horizons accelerated in 2024, with the latest models double their horizons around every three months. The work has not yet been formally examined.

In the progress from 2019 to 2024, Metr suggests that AI models can manage tasks by 2029 that people need about a month, with a reliability of 50 %, possibly even earlier.

A month of committed human expertise, according to the paper, can be enough to start a new company or make scientific discoveries.

Joshua Gans, Professor of management at the University of Toronto in Canada, who wrote about the economy of AI, explains that such predictions are not particularly useful. "Extrapolations are tempting, but there is still so much that we do not know how AI is actually used so that these predictions make sense," he says.

assessment of human versus ki

The team chose the success rate of 50 %because it was most robust compared to small changes in the distribution of data. "If you choose very low or very high threshold values, adding or removing a single successful or failed task change the estimate according to," explains co -author Lawrence Chan.

An increase in reliability from 50 % to 80 % reduced the average time horizon by a factor of five - although the general doubling time and the trend line were similar.

In the past five years, the improvements of the General skills of LLMS mainly driven by scale increases-the amount of training data, training time and number of model parameters. The paper leads the progress in the key figure of the time horizon mainly to improvements in logical argument, the use of tools, error correction and self -confidence in the fulfillment of tasks.

The METR approach to evaluate the time horizon addresses some of the restrictions of existing AI benchmarks, which only match the real work and are quickly “saturated” when the models improve. It offers a continuous, intuitive measure that better records significant progress over a longer period of time, says Mitautor Ben West.

Leading AI models achieve superhuman performance at many Benchmark tests , but have had relatively low economic effects so far, explains West. The latest research from METR offer a partial answer to this riddle: the best models show a time frame of about 40 minutes, and there is not much economically valuable work that a person can do during this time, according to West.

Anton Troynikov, a AI researcher and entrepreneur from San Francisco, California, explains that AI would have a greater economic influence if organizations were better willing to experiment and invest in effective use of the models.

  1. KWA, T. et al. Preprint at arxiv https://doi.org/10.48550/arxiv.2503.14499 (2025).

  2. Download references