Do AI models produce more original ideas than researchers?

Do AI models produce more original ideas than researchers?
A ideas generator system driven by artificial intelligence (AI) has developed more original research approaches in a recent prop on Arxiv than 50 scientists who worked independently of one another 1 .
The ideas generated by people and AI were evaluated by experts who did not know who or what did every idea. The experts assessed the AI-generated concepts as an exciting compared to the ideas written by humans, although the proposals of the AI do a little worse in terms of feasibility.
However, the scientists point out that the study, which has not yet been peer-reviewed, has restrictions. She focused on a specific research area and required the human participants to develop the ideas spontaneously, which probably impairs their ability to produce the best concepts.
artificial intelligence in science
There is Increasing efforts to investigate how large language models (LLMS) for automation of research tasks such as the Write articles , Generate Code and Literature research can be used. However, it was difficult to assess whether these AI tools could generate fresh research approaches at a similar level as people. This is because the evaluation of ideas is very subjective and requires specialists who are able to carefully evaluate them, says Chenglei Si, co-author of the study and computer scientist at Stanford University in California. "The best way to contextualize such skills is to make a direct comparison," says Si.
The one-year project is one of the largest projects for evaluating, whether large language models-the technology behind tools such as Chatgpt -Innovative research approaches, Tom Hope, Computer scientist at all institutes for AI in Jerusalem. "There must be more work," he says.
The team recruited more than 100 researchers in the field of natural language processing, a sub -area of computer science that deals with communication between AI and humans. Nine -end participants were commissioned to develop ideas and formulate within ten days based on one of seven topics. As an incentive, the researchers received $ 300 for every idea, with a bonus of $ 1,000 for the five best ideas.
At the same time, the researchers developed an idea generator with Claude 3.5, an LLM developed by Anthropic in San Francisco, California. The researchers asked their AI tool to find relevant articles on the seven research topics via Semantic Scholar, a AI-based literature search engine. Based on these articles, the researchers asked their AI agents to generate 4,000 ideas for every research topic and to evaluate the most original ones.
human expert
Then the researchers randomly assigned the human and the AI generated ideas 79 experts who evaluated every idea with regard to novelty, tension, feasibility and expected effectiveness. To ensure that the creators of the ideas remained unknown to the experts, the researchers used another LLM to edit both types of text so that the writing style and the sound were standardized without changing the ideas themselves.
On average, the experts rated the AI generated ideas as more original and more exciting than that of human participants. When looking at the 4,000 ideas produced by LLM, the researchers found only about 200, which were really unique, which indicates that the AI became less original the more ideas it generated.
When the participants interviewed, most admitted that their submitted ideas were only average compared to the ideas produced in the past.
The results indicate that LLMS may easily generate ideas than the existing literature, says Cong LU, researcher for machine learning at the University of British Columbia in Vancouver, Canada. However, whether you can outperform the most groundbreaking human ideas remains an open question.
A further restriction of the study is that the compared written ideas were processed by an LLM, which changed the language and length of the submissions, says Jevin West, social science computer scientist at the University of Washington in Seattle. Such changes could have been subtle, as the experts perceived the novelty, he adds. West adds that it may not be a completely fair comparison to have researchers competing against an LLM that can generate thousands of ideas in a few hours. "You have to compare apples with apples," he says.
si and his colleagues plan to compare AI-generated ideas with leading conference contributions in order to get a better understanding of how LLMS do compared to human creativity. "We try to stimulate the community to think more about what the future should look like if AI can take on a more active role in the research process," he says.
-
si, C., Yang, D. & Hashimoto, T. Preprint at arxiv https://doi.org/10.48550/arxiv.24109 (2024).