GPT-5 is the latest version of OpenAI’s large language model Cheng Xin/Getty Images
AI鈥檚 latest step forward isn’t so much a giant leap as a tentative shuffle. OpenAI has released its newest AI model, , two years after rolling out GPT-4, whose success has driven ChatGPT towards world domination. But despite promises of a similar jump in capability, GPT-5 appears to show little improvement over other leading AI models, hinting that the industry may need a fresh approach to build more intelligent AI systems.
OpenAI鈥檚 own pronouncements hail GPT-5 as a 鈥渟ignificant leap in intelligence鈥 from the company鈥檚 previous models, showing apparent improvements in programming, mathematics, writing, health information and visual understanding. It also promises less frequent hallucinations, which is when an AI presents false information as true. On an internal benchmark measuring 鈥減erformance on complex, economically valuable knowledge work鈥, OpenAI says GPT鈥5 is 鈥渃omparable to or better than experts in roughly half the cases鈥 across tasks spanning over 40 occupations including law, logistics, sales, and engineering.鈥
However, GPT-5鈥檚 performance on public benchmarks isn鈥檛 dramatically better than leading models from other AI companies, like Anthropic鈥檚 Claude or Google鈥檚 Gemini. It has improved on GPT-4, but the difference for many benchmarks is smaller than the leap from GPT-3 to GPT-4. Many ChatGPT customers have also , with examples of GPT-5 failing to answer seemingly simple queries receiving widespread attention on social media.
鈥淎 lot of people hoped that there would be a breakthrough, and it’s not a breakthrough,鈥 says at the University of Edinburgh, UK. 鈥淚t’s an upgrade, and it feels kind of incremental.鈥
The most comprehensive measures of GPT-5’s performance come from OpenAI itself, since only it has full access to the model. Few details about the internal benchmark have been made public, says at the IT University of Copenhagen in Denmark. 鈥淗ence, it is not something that can be seriously discussed as a scientific claim.鈥
Free newsletter
Sign up to The Daily
The latest on what鈥檚 new in science and why it matters each day.

In a press briefing before the model鈥檚 launch, OpenAI CEO Sam Altman claimed 鈥淕PT-5 is the first time that it really feels like talking to an expert in any topic, like a PhD-level expert.鈥 But this isn’t supported by benchmarks, says Rogers, and it is unclear how a PhD relates to intelligence more generally. 鈥淗ighly intelligent people don’t necessarily have PhD degrees, and having such a degree doesn’t necessarily guarantee high intelligence,鈥 says Rogers.
GPT-5鈥檚 apparently modest improvements might be a sign of wider difficulties for AI developers. Until recently, it was thought that such large language models (LLMs) get more capable with more training data and computer power. It appears this is no longer borne out by the results of the latest models, and companies have failed to find better AI system designs than those that have powered ChatGPT. 鈥淓verybody has the same recipe right now and we know what the recipe is,鈥 says Lapata, referring to the process of pre-training models with a large amount of data and then making adjustments with post-training processes afterwards.
However, it is difficult to say how close LLMs are to stagnating because we don鈥檛 know exactly how models like GPT-5 are designed, says at the University of Sheffield, UK. 鈥淭rying to make generalisations about [whether] large language models have hit a wall might be premature. We can’t really make these claims without any information about the technical details.鈥
OpenAI has been working on other ways to make its product more efficient, such as GPT-5鈥檚 new routing system. Unlike previous instances of ChatGPT, where people can choose which AI model to use, GPT-5 now scans requests and directs them to a specific model that will use an appropriate amount of computational power.
This approach might be adopted more widely, says Lapata. 鈥淭he reasoning models use a lot of [computation], and this takes time and money,鈥 he says. 鈥淚f you can answer it with a smaller model, we will see more of that in the future.鈥 But the move has angered some ChatGPT customers, prompting Altman to say the at improving the routing process.
There are more positive signs for the future of AI in a separate OpenAI model that has achieved gold medal scores in elite mathematical and coding competitions in the past month, something that top AI models couldn’t do a year ago. While details of how the models work are again scant, said its success suggests the system has more general reasoning capabilities.
These competitions are useful for testing models on data they haven鈥檛 seen during their training, says Aletras, but they are still narrow tests of intelligence. Increasing a model’s performance in one area might also make it worse at others, says Lapata, which can be difficult to keep track of.
One area where GPT-5 has significantly improved , which is now far cheaper than other models 鈥 Anthropic鈥檚 best Claude model, for example, to process the same number of requests at the time of writing. But this could present its own problems in the long run, if OpenAI鈥檚 income doesn’t cover the vast costs they have committed to building and running new data centres. 鈥淭he pricing is insane. It’s so cheap I don’t know how they can afford this,鈥 says Lapata.
Competition between the top AI models is fierce, especially with the expectation that the first model to pull ahead of the others will take most of the market share. 鈥淎ll these big companies, they’re trying to be the one winner, and this is hard,鈥 says Lapata. 鈥淵ou’re a winner for three months.鈥
Topics:



