AI will soon take over projects that take humans weeks

Transparenz: Redaktionell erstellt und geprüft.
Veröffentlicht am

Artificial intelligence is improving rapidly and could soon take on projects that take humans weeks to complete. Expert analysis shows that leading AI models are making progress and could complete tasks with human expertise in less time by 2029.

Künstliche Intelligenz verbessert sich rasant und könnte bald Projekte übernehmen, für die Menschen Wochen benötigen. Expertenanalysen zeigen, dass führende KI-Modelle im Fortschritt sind und bis 2029 Aufgaben mit menschlicher Expertise in kürzerer Zeit bewältigen könnten.
Artificial intelligence is improving rapidly and could soon take on projects that take humans weeks to complete. Expert analysis shows that leading AI models are making progress and could complete tasks with human expertise in less time by 2029.

AI will soon take over projects that take humans weeks

Today's artificial intelligence (AI) systems cannot outperform humans on long tasks, but they are evolving rapidly further and could close the gap faster than many expected, according to an analysis of leading models 1.

The Berkeley, California-based nonprofit METR developed nearly 170 real-world tasks in programming, cybersecurity, general reasoning and machine learning, then established a "human baseline" by measuring the time it took experts to complete those tasks.

The team then developed a metric to assess the progress of AI models, which is called the “task completion time horizon.” This is the time it typically takes for programmers to complete the tasks that AI models can complete with a certain success rate.

In a preprint published this week on arXiv, METR reports that GPT-2, an early large language model (LLM) released by OpenAI in 2019, failed at all tasks that took human experts more than a minute. Claude 3.7 Sonnet, released in February by US startup Anthropic, completed 50% of tasks that would take humans 59 minutes.

Overall, the time horizon of the 13 leading AI models has doubled approximately every seven months since 2019, according to the study. The exponential growth of AI time horizons accelerated in 2024, with the latest models doubling their horizon approximately every three months. The work has not yet been formally reviewed.

Moving forward from 2019 to 2024, METR suggests that AI models will be able to complete tasks that take humans about a month with 50% reliability by 2029, perhaps even sooner.

One month of dedicated human expertise, the paper suggests, can be enough to start a new company or make scientific discoveries.

However, Joshua Gans, a professor of management at the University of Toronto in Canada who has written about the economics of AI, explains that such predictions are not particularly useful. “Extrapolations are tempting, but there is still so much we don’t know about how AI will actually be used for these predictions to make sense,” he says.

Judging humans versus AI

The team chose the 50% success rate because it was most robust to small changes in the data distribution. “If you choose very low or very high thresholds, adding or removing a single successful or failed task accordingly changes the estimate greatly,” explains co-author Lawrence Chan.

Increasing reliability from 50% to 80% reduced the average time horizon by a factor of five - even though the overall doubling time and trend line were similar.

Over the last five years, improvements have been made to the general skills of LLMs driven primarily by increases in scale—the amount of training data, training time, and number of model parameters. The paper attributes progress in the time horizon metric primarily to improvements in logical reasoning, tool use, error correction, and task confidence.

METR's approach to assessing time horizons addresses some of the limitations of existing AI benchmarks, which only loosely match real-world work and quickly become "saturated" as models improve. It provides a continuous, intuitive measure that better captures significant progress over time, says co-author Ben West.

Leading AI models achieve superhuman performance in many Benchmark testing, but so far have had relatively little economic impact, explains West. METR's latest research offers a partial answer to this puzzle: The best models show a time frame of about 40 minutes, and there isn't much economically valuable work a person can do in that time, West said.

However, Anton Troynikov, an AI researcher and entrepreneur from San Francisco, California, explains that AI would have a greater economic impact if organizations were more willing to experiment and invest in using the models effectively.

  1. Kwa, T. et al. Preprint at arXiv https://doi.org/10.48550/arXiv.2503.14499 (2025).

Download references

Quellen: