Do AI models produce more original ideas than researchers?

Transparenz: Redaktionell erstellt und geprüft.
Veröffentlicht am

A new study shows that AI models can generate more original research ideas than 50 scientists. Experts evaluate these approaches.

Eine neue Studie zeigt, dass KI-Modelle mehr originelle Forschungsideen generieren können als 50 Wissenschaftler. Experten bewerten diese Ansätze.
A new study shows that AI models can generate more original research ideas than 50 scientists. Experts evaluate these approaches.

Do AI models produce more original ideas than researchers?

An artificial intelligence (AI)-powered idea generator system has developed more original research approaches than 50 scientists working independently in a recent preprint on arXiv 1.

The human- and AI-generated ideas were evaluated by reviewers who did not know who or what created each idea. Reviewers rated the AI-generated concepts as more exciting compared to human-authored ideas, although the AI's suggestions scored slightly lower in terms of feasibility.

However, the scientists point out that the study, which has not yet been peer-reviewed, has limitations. It focused on a specific area of ​​research and required human participants to generate ideas spontaneously, which likely hindered their ability to produce the best concepts.

Artificial intelligence in science

There are rising aspirations, to investigate how large language models (LLMs) can be used to automate research tasks such as Writing articles, Generate code and Literature research can be used. However, it has been difficult to assess whether these AI tools can generate fresh research approaches at a similar level to humans. This is because the evaluation of ideas very subjective and requires specialists who are able to carefully evaluate them, says Chenglei Si, co-author of the study and a computer scientist at Stanford University in California. “The best way to contextualize such capabilities is to make a side-by-side comparison,” says Si.

The year-long project is one of the largest efforts to evaluate whether large language models - the technology behind tools like ChatGPT – can produce innovative research approaches, explains Tom Hope, a computer scientist at the Allen Institute for AI in Jerusalem. “There needs to be more work like this,” he says.

The team recruited more than 100 researchers in the field of natural language processing, a branch of computer science that deals with communication between AI and humans. Forty-nine participants were tasked with developing and formulating ideas within ten days based on one of seven themes. As an incentive, researchers received $300 for each idea, with a $1,000 bonus for the top five ideas.

At the same time, the researchers developed an idea generator using Claude 3.5, an LLM developed by Anthropic in San Francisco, California. The researchers asked their AI tool to find relevant articles on the seven research topics through Semantic Scholar, an AI-powered literature search engine. Based on these articles, the researchers asked their AI agent to generate 4,000 ideas on each research topic and evaluate the most original ones.

Human assessors

The researchers then randomly assigned the human and AI-generated ideas to 79 reviewers, who rated each idea for novelty, excitement, feasibility and expected effectiveness. To ensure that the creators of the ideas remained unknown to the reviewers, the researchers used another LLM to edit both types of text so that the writing style and tone were standardized without changing the ideas themselves.

On average, the reviewers rated the AI-generated ideas as more original and exciting than those written by human participants. However, when they looked more closely at the 4,000 ideas produced by LLM, the researchers found only about 200 that were truly unique, suggesting that the AI ​​became less original the more ideas it generated.

When Si surveyed the participants, most admitted that the ideas they submitted were only average compared to the ideas produced in the past.

The results suggest that LLMs may easily generate more original ideas than the existing literature, says Cong Lu, a machine learning researcher at the University of British Columbia in Vancouver, Canada. However, whether they can surpass the most groundbreaking human ideas remains an open question.

Another limitation of the study is that the written ideas compared were edited by an LLM, which changed the language and length of the submissions, says Jevin West, a social science computer scientist at the University of Washington in Seattle. Such changes may have subtly influenced how reviewers perceived the novelty, he adds. West adds that pitting researchers against an LLM that can generate thousands of ideas in a few hours may not be a completely fair comparison. “You have to compare apples with apples,” he says.

Si and his colleagues plan to compare AI-generated ideas with leading conference papers to gain a better understanding of how LLMs compare to human creativity. “We’re trying to encourage the community to think more deeply about what the future should look like when AI can take a more active role in the research process,” he says.

  1. Si, C., Yang, D. & Hashimoto, T. Preprint at arXiv https://doi.org/10.48550/arXiv.2409.04109 (2024).

Download references

Quellen: