女生小视频

Mathematics

DeepMind and OpenAI claim gold in International Mathematical Olympiad

Two AI models have achieved gold medal standard for the first time in a prestigious competition for young mathematicians 鈥 and their developers claim these AIs could soon crack tough scientific problems

By Alex Wilkins

22 July 2025

AIs are getting better at maths problems

Andresr/ Getty Images

Experimental AI models from Google DeepMind and OpenAI have achieved a gold-level performance in the International Mathematical Olympiad (IMO) for the first time.

The companies are hailing the moment as an important milestone for AIs that might one day solve hard scientific or mathematical problems, but mathematicians are more cautious because details of the models鈥 results and how they work haven’t been made public.

The IMO, one of the world鈥檚 most prestigious competitions for young mathematicians, has long been seen by AI researchers as a litmus test for mathematical reasoning that AI systems tend to struggle with.

After last year鈥檚 competition held in Bath, UK, Google DeepMind announced that AI systems it had developed, called AlphaProof and AlphaGeometry, had together achieved a silver medal-level performance, but its entries weren’t graded by the competition鈥檚 official markers.

Before this year鈥檚 contest, which was held in Queensland, Australia, companies including Google, Huawei and TikTok-owner ByteDance, as well as academic researchers, approached the organisers to ask whether they could have their AI models鈥 performance officially graded, says , the IMO’s president. The IMO agreed, with the proviso that the companies waited to announce their results until 28 July, when the IMO鈥檚 full closing ceremonies had been completed.

Free newsletter

Sign up to The Daily

The latest on what鈥檚 new in science and why it matters each day.

New 女生小视频. Science news and long reads from expert journalists, covering developments in science, technology, health and the environment on the website and the magazine.

OpenAI also asked if it could participate in the competition, but after it was informed about the official scheme, it didn’t respond or register an entry, says Dolinar.

On 19 July, that a new AI it had developed had achieved a gold medal score marked by three former IMO medallists separate from the official competition. The AI answered five out of six questions correctly in the same 4.5-hour time limit as the contestants, OpenAI said.

Two days later, Google DeepMind also announced that , called Gemini Deep Think, had achieved gold with the same score and time limits. Dolinar confirmed that this result was given by the IMO’s official markers.

Unlike Google鈥檚 AlphaProof and AlphaGeometry systems, which were crafted especially for the competition and worked with questions and answers written in a computer programming language called Lean, both Google and OpenAI鈥檚 models this year worked entirely in natural language.

Working in Lean meant the AI鈥檚 output could be instantly checked for correctness, but it is harder for non-experts to read. at Google, who worked on Gemini Deep Think, says the natural language approach could produce more understandable answers, as well as being applicable to generally useful AI systems.

Luong says the ability to verify solutions in a large language model has been made possible thanks to progress with reinforcement learning, a training method in which an AI is taught what success looks like and is left to figure out the rules and how to succeed solely through trial and error. This method was key to Google鈥檚 previous success with its game-playing AIs, such as AlphaZero.

Google鈥檚 model also considers multiple solutions at once, in a mode called parallel thinking, as well as being trained on a dataset of maths problems specifically useful for the IMO, says Luong.

OpenAI has released few details on its system, apart from that it also uses reinforcement learning and 鈥渆xperimental research methods鈥.

鈥淭he progress is promising, but not performed in a controlled scientific fashion, and so I will not be able to assess it at this stage,鈥 says at the University of California, Los Angeles. “Perhaps once the companies involved release some papers with more data, and hopefully enough access to the model for others to replicate the results, one can say something more definitive, but, for now, we largely have to trust the companies themselves for the claimed results.鈥

at the University of Sydney in Australia agrees. “I think it is remarkable that this is where we鈥檙e at. It is frustrating how little detail outsiders are provided with regarding internals,鈥 says Williamson.

While systems working in natural language could be useful for non-mathematicians, it could also present a problem if models produce long proofs that are hard to check, says , one of the organisers of this year鈥檚 IMO. “If AIs are ever to produce solutions to significant unsolved problems that might plausibly be correct but might also have a few subtle but fatal errors hidden accidentally, or potentially deliberately from a misaligned AI, having those AIs also generate a formal proof is key to having confidence in the correctness of a long AI output before attempting to read it.鈥

Both companies say that, in the coming months, they will offer these systems for testing to mathematicians at first, before releasing them to the wider public. The models could soon help with harder scientific research problems, says at Google, who worked on Gemini Deep Think. 鈥淭here are going to be many, many unsolved problems within reach,鈥 he says.

Topics:

Sign up to our weekly newsletter

Receive a weekly dose of discovery in your inbox. We'll also keep you up to date with New 女生小视频 events and special offers.

Sign up
Piano Exit Overlay Banner Mobile Piano Exit Overlay Banner Desktop