女生小视频

Technology

AIs get worse at answering simple questions as they get bigger

Using more training data and computational power is meant to make AIs more reliable, but tests suggest large language models actually get less reliable as they grow

By Chris Stokel-Walker

25 September 2024

Large language models are capable of answering a wide range of questions – but not always accurately

Jamie Jin/Shutterstock

Large language models (LLMs) seem to get less reliable at answering simple questions when they get bigger and learn from human feedback.

AI developers try to improve the power of LLMs in two main ways: scaling up 鈥 giving them more training data and more computational power 鈥 and shaping up, or fine-tuning them in response to human feedback.

at the Polytechnic University of Valencia, Spain, and his colleagues examined the performance of LLMs as they scaled up and shaped up. They looked at OpenAI鈥檚 GPT series of chatbots, Meta鈥檚 LLaMA AI models, and BLOOM, developed by a group of researchers called BigScience.

The researchers tested the AIs by posing five types of task: arithmetic problems, solving anagrams, geographical questions, scientific challenges and pulling out information from disorganised lists.

They found that scaling up and shaping up can make LLMs better at answering tricky questions, such as rearranging the anagram 鈥測oiirtsrphaepmdhray鈥 into 鈥渉yperparathyroidism鈥. But this isn鈥檛 matched by improvement on basic questions, such as 鈥渨hat do you get when you add together 24427 and 7120鈥, which the LLMs continue to get wrong.

Free newsletter

Sign up to The Daily

The latest on what鈥檚 new in science and why it matters each day.

New 女生小视频. Science news and long reads from expert journalists, covering developments in science, technology, health and the environment on the website and the magazine.

While their performance on difficult questions got better, the likelihood that an AI system would avoid answering any one question 鈥 because it couldn鈥檛 鈥 dropped. As a result, the likelihood of an incorrect answer rose.

The results highlight the dangers of presenting AIs as omniscient, as their creators often do, says Hern谩ndez-Orallo 鈥 and which some users are too ready to believe. 鈥淲e have an overreliance on these systems,鈥 he says. 鈥淲e rely on and we trust them more than we should.鈥

That is a problem because AI models aren’t honest about the extent of their knowledge. 鈥淧art of what makes human beings super smart is that sometimes we don鈥檛 realise that we don’t know something that we don鈥檛 know, but compared to large language models, we are quite good at realising that,鈥 says at the University of Oxford. 鈥淟arge language models do not know the limits of their own knowledge.鈥

OpenAI, Meta and BigScience didn’t respond to New 女生小视频鈥檚 request for comment.

Journal reference:

Nature

Topics:

Sign up to our weekly newsletter

Receive a weekly dose of discovery in your inbox. We'll also keep you up to date with New 女生小视频 events and special offers.

Sign up
Piano Exit Overlay Banner Mobile Piano Exit Overlay Banner Desktop