女生小视频

Technology

University examiners fail to spot ChatGPT answers in real-world test

ChatGPT-written exam submissions for a psychology degree mostly went undetected and tended to get better marks than real students鈥 work

By Chris Stokel-Walker

26 June 2024

Exams taken in person make it harder for students to cheat using AI

Trish Gant / Alamy

Ninety-four per cent of university exam submissions created using ChatGPT weren’t detected as being generated by artificial intelligence, and these submissions tended to get higher scores than real students鈥 work.

at the University of Reading, UK, and his colleagues used ChatGPT to produce answers to 63 assessment questions on five modules across the university鈥檚 psychology undergraduate degrees. Students sat these exams at home, so they were allowed to look at notes and references, and they could potentially have used AI although this wasn鈥檛 permitted.

The AI-generated answers were submitted alongside real students鈥 work, and accounted for, on average, 5 per cent of the total scripts marked by academics. The markers weren’t informed that they were checking the work of 33 fake students 鈥 whose names were themselves generated by ChatGPT.

The assessments included two types of questions: short answers and longer essays. The prompts given to ChatGPT began with the words 鈥淚ncluding references to academic literature but not a separate reference section鈥, then copied the exam question.

Across all modules, only 6 per cent of the AI submissions were flagged as potentially not being a student鈥檚 own work 鈥 though in some modules, no AI-generated work was flagged as suspicious. 鈥淥n average, the AI responses gained higher grades than our real student submissions,鈥 says Scarfe, though there was some variability across modules.

Free newsletter

Sign up to The Daily

The latest on what鈥檚 new in science and why it matters each day.

New 女生小视频. Science news and long reads from expert journalists, covering developments in science, technology, health and the environment on the website and the magazine.

鈥淐urrent AI tends to struggle with more abstract reasoning and integration into information,鈥 he adds. But across all 63 AI submissions, there was an 83.4 per cent chance that the AI work outscored that of the students.

The researchers claim that their work is the largest and most robust study of its kind to date. Although the study only checked work on the University of Reading鈥檚 psychology degree, Scarfe believes it is a concern for the whole academic sector. 鈥淚 have no reason to think that other subject areas wouldn’t have just the same kind of issue,鈥 he says.

鈥淭he results show exactly what I鈥檇 expect to see,鈥 says at Imperial College London. 鈥淲e know that generative AI can produce reasonable sounding responses to simple, constrained textual questions.鈥 He points out that unsupervised assessments including short answers have always been susceptible to cheating.

The workload for academics expected to mark work also doesn’t help their ability to pick up AI fakery. 鈥淭ime-pressured markers of short answer questions are highly unlikely to raise AI misconduct cases on a whim,鈥 says Lancaster. 鈥淚 am sure this isn鈥檛 the only institution where this is happening.鈥

Tackling it at source is going to be near-impossible, says Scarfe. So the sector must instead reconsider what it is assessing. 鈥淚 think it鈥檚 going to take the sector as a whole to acknowledge the fact that we鈥檙e going to have to be building AI into the assessments we give to our students,鈥 he says.

Journal reference:

PLoS One

Topics:

Sign up to our weekly newsletter

Receive a weekly dose of discovery in your inbox. We'll also keep you up to date with New 女生小视频 events and special offers.

Sign up
Piano Exit Overlay Banner Mobile Piano Exit Overlay Banner Desktop