After Google DeepMind defeated humans in everything, from Play Go to strategy board games,
it now claims to be on the verge of beating the world's best students at solving math problems.
The London-based Machine learning The company announced July 25 that its artificial intelligence (AI) systems solved four of the six problems given to students at the 2024 International Mathematical Olympiad (IMO) in Bath, United Kingdom. The AI provided rigorous, step-by-step proofs evaluated by two top mathematicians and achieved a score of 28/42 - just one point out of gold medal territory.
“It's obviously a very significant advance,” says Joseph Myers, a mathematician from Cambridge, UK, who, along with Fields Medalist Tim Gowers, reviewed the solutions and helped select the original problems for this year's IMO.
DeepMind and other companies are in the race to ultimately provide machines with evidence that matters Solve research questions in mathematics. The problems presented at the IMO, the world's leading competition for young mathematicians, have become a benchmark for progress toward that goal and are considered a "grand challenge" for machine learning, the company said.
“This is the first time an AI system has achieved medal-level performance,” Pushmeet Kohli, vice president of AI in science at DeepMind, said in a press briefing. “This is an important milestone in building advanced evidence investigators,” Kohli said.
Extension
Just a few months ago, in January, the DeepMind system AlphaGeometry medal level achievements achieved in solving one type of IMO problems, namely those in Euclidean geometry. The first AI to perform at gold medal level on the overall test – including questions in algebra, combinatorics and number theory, generally considered more challenging than geometry – will be eligible to receive a $5 million prize, the AI Mathematics Olympiad Prize (AIMO). (The award has strict criteria such as disclosing source code and working with limited computing power, meaning DeepMind's current efforts would not qualify.)
In their latest attempt, the researchers used AlphaGeometry2 to solve the geometry problem in under 20 seconds; the AI is an improved and faster version of their record system, says DeepMind computer specialist Thang Luong.
For the other types of questions, the team developed a completely new system called AlphaProof. AlphaProof solved two algebra problems in the competition and one in number theory, which took three days. (Participants in the actual IMO have two sessions of 4.5 hours each.) It was unable to solve the two problems in combinatorics, another area of mathematics.

Researchers have had mixed results when answering mathematical questions using language models — the kind of systems that power chatbots like ChatGPT. Sometimes the models give the right answer but cannot explain their reasoning rationally, and sometimes they spout nonsense.
Just last week, a team of researchers from software companies Numina and HuggingFace used a language model to win an intermediate AMIO 'progress prize' based on simplified versions of IMO problems. The companies have made their entire systems open source and made them available for other researchers to download. But the winners saidNature, that language models alone would probably not be enough to solve more difficult problems.
Only class
AlphaProof combines a language model with reinforcement learning technology that uses the “AlphaZero” engine that the company has successfully used for attack games such as Go and some specific mathematical problems used. In reinforcement learning, a neural network learns through trial and error. This works well when his answers can be evaluated using an objective standard. To this end, AlphaProof was trained to read and write proofs in a formal language called Lean, which is used in the 'Proof Assistant' software package of the same name popular with mathematicians. To do this, AlphaProof tested whether its outputs were correct by running them in the Lean package, which helped fill out some of the steps in the code.
Training a language model requires massive amounts of data, yet little mathematical proof was available in Lean. To overcome this problem, the team developed an additional network that attempted to translate an existing record of a million problems written in natural language but without human-written solutions into Lean, says Thomas Hubert, a DeepMind machine learning researcher who co-led the development of AlphaProof. “Our approach was, can we learn to prove even if we didn’t originally train on human-written proofs?” (The company took a similar approach to Go, where its AI learned to play the game by playing against itself rather than from the way humans do it.)
Magic keys
Many of the Lean translations didn't make sense, but enough were good enough to get AlphaProof to the point where it could start its reinforcement learning cycles. The results were much better than expected, Gowers said at the press briefing. "Many problems at the IMO have this magic key property. The problem looks hard at first until you find a magic key that opens it," said Gowers, who works at the Collège de France in Paris.
In some cases, AlphaProof seemed to be able to provide that extra step of creativity by providing one correct step from an infinitely large possible solution. But further analysis is needed to determine whether the answers were less surprising than they seemed, Gowers added. A similar discourse emerged after the surprising one 'Train 37', the DeepMinds AlphaGo bot at his famous 2016 victory over the world's best human Go player made – a turning point for AI.
Whether the techniques can be perfected to work at a research level in mathematics remains to be seen, Myers said at the press briefing. “Can it expand to other types of mathematics that may not have millions of problems trained on?”
"We're at the point where they can prove not only open research problems, but also problems that are very challenging to the very best young mathematicians in the world," said DeepMind computer specialist David Silver, who was the lead researcher developing AlphaGo in the mid-2010s.