Scientists impressed by the latest Chatgpt model O1

Scientists impressed by the latest Chatgpt model O1
researchers who helped to test the new large language model from Openai, Openai O1, say that it is a big step in terms of Use of chatbots for science represents.
"In my area of quantum physics there are much more detailed and more coherent answers" than in the previous model, GPT-4O, says Mario Krenn, head of the Artificial Scientist Lab at the Max Planck Institute for the Physics of Light in Erlangen, Germany. Krenn belonged to a group of scientists in the 'Red Team', who tested the pre -dating from O1 for Openai, a technology company based in San Francisco, California, by trying out the bot and checking them on security concerns.
since The public introduction of Chatgpt in 2022 are the large language models that drive such chatbots, on average greater and better, with more parameters, larger training data sets and Skills in a large number of standardized tests .
Openaai explains that the O1 series represents a fundamental change in the company's approach. Observers report that this AI model is characterized by the fact that it has spent more time in certain learning phases and "thinks" about its answers longer, which makes it slower but more capable-especially in areas where correct and wrong answers are clearly defined. The company adds that O1 can "think through complex tasks and solve more difficult problems than previous models in science, programming and mathematics". Currently, O1-Preview and O1-Mini-a smaller, more cost-effective version that is suitable for programming are available for paying customers and certain developers in test operation. The company has not published any information on the parameters or the computing power of the O1 models.
exceed the doctoral students
Andrew White, a Chemist at Futurehouse, a non-profit organization in San Francisco, which focuses on how AI can be used in molecular biology, says that observer in the past year and a half href = "https://www.nature.com/articles/d41586-023-00816-5" Data-Track = "Click" Data-Label = "https://www.nature.com/articles/d41586-00816-5" Data-track-category = "Body text"> Public publication of GPT-4 , were surprised and disappointed by a general lack of improvements in the support of scientific tasks by chatbots. The O1 series, he thinks this has changed.
Remarkable is O1 The first major language model that doctoral students in the most difficult question-the ‘Diamond’-Set-called Graduate-Level Google-Proof Q & A Benchmark (GPQA) beats 1 . Openai states that his researchers achieved almost 70 % in the GPQA Diamond, while O1 reached a total of 78 %, with a particularly high result of 93 % in physics (see "Next level"). This is "significantly higher than the next best documented performance," says David Rein, who was part of the team that developed the GPQA. Currently, the non -profit organization Model Evaluation and Threat Research is working in Berkeley, California, which deals with the evaluation of the risks of AI. "It seems plausible to me that this represents a significant and fundamental improvement in the model's core skills," he adds.
Openaai also tested O1 during a qualification test for the international math Olympiad. The previous best model, GPT-4O, only solved 13 % of the tasks correctly, while O1 achieved 83 %.
think in processes
Openai O1 works with a chain of memorial steps: it speaks through a number of considerations while trying to solve a problem and corrects itself.
Openaai has decided to keep the details of a given chain of thought - partly because the chain could contain errors or socially non -acceptable “thoughts”, and partly to protect corporate secrets on how the model works. Instead, O1 offers a reconstructed summary of his logic for the user together with his answers. It is unclear, according to White, whether the complete chain of minds, if it were revealed, would have similarities with human thinking.
The new skills also have their dark sides. Openai reports that it has received anecdotal feedback that O1 models "hallucinate" more often-invent false answers-as their predecessors (although internal tests for O1 show slightly lower hallucination rates).
The scientists of the Red Team have found numerous options for how O1 was helpful in developing protocols for scientific experiments, but Openaai says that the testers also showed “lack of safety information on harmful steps, such as the non-removal of explosion hazards or the suggestions inadequate chemical safety methods, which indicates the inadequacy of the model when it comes to safety-critical tasks goes".
"It is still not perfect or reliable enough to not have to be checked exactly," says White. He adds that O1 is more suitable for to lead experts as beginners . "For a beginner, it is beyond her immediate ability to look at a protocol generated by O1 and to recognize that it is" nonsense "," he says.
problem solver of science
Krenn believe that O1 will accelerate science by helping to scan the literature, recognizing gaps and proposing interesting research approaches for future studies. He integrated O1 into a tool that he has developed and that enables this called scimuse 2 . "It generates much more interesting ideas than GPT-4 or GPT-4O," he says.
Kyle Kabasares, a data scientist at the Bay Area Environmental Research Institute in Moffett Field, California, Use O1 to replicate some programming steps from his doctorate project, which calculated the mass of black holes. "I was just overwhelmed," he says, noticing that O1 needed about an hour to achieve what cost him for many months.
Catherine Brownstein, a geneticist at the Boston Children’s Hospital in Massachusetts, says that the hospital is currently testing several AI systems, including O1 preview, for applications such as uncovering relationships between patient characteristics and genes for rare diseases. She says O1 "is more precise and offers options that I didn't think they were possible from a chat bot".
-
Rein, D. et al. Preprint at arxiv https://doi.org/10.48550/arxiv.2311.12022 (2023).
-
GU, X. & Krenn, M. Preprint at arxiv https://doi.org/10.48550/arxiv.2405.17044 (2024).