女生小视频

Technology

AIs can guess where Reddit users live and how much they earn

Large language models such as GPT-4 were able to identify people鈥檚 personal information by analysing their posts on social media

By Chris Stokel-Walker

1 November 2023

We may reveal more of ourselves on the internet than we realise

Brain light / Alamy

Large language models (LLMs) like GPT-4 can identify a person鈥檚 age, location, gender and income with up to 85 per cent accuracy simply by analysing their posts on social media.

and at ETH Zurich in Switzerland got nine LLMs to pore through a database of Reddit posts and pick up identifying information in the way users wrote.

Staab and Vero randomly selected 1500 profiles of users who engaged on the platform, then聽narrowed these down to 520聽users for which they could confidently identify attributes like a person鈥檚 place of聽birth, their income bracket, gender and location, either in聽their聽profiles or posts.

When given the posting history of those users, some of the LLMs were able to identify many of these attributes with a high degree of accuracy. GPT-4 achieved the highest overall accuracy with 85 per cent, while LlaMA-2-7b, a comparatively low-powered LLM, was the least accurate model with 51 per cent.

鈥淚t tells us that we give a lot of聽our聽personal information away on the internet without thinking about it,鈥 says Staab. 鈥淢any people would not assume that you can directly infer their age or their location from how they write, but聽LLMs are quite capable.鈥

Sometimes, personal details were explicitly stated in the posts. For example, some users post their income in forums about financial advice. But the AIs also picked up聽on聽subtler cues, like聽location-specific slang, and could estimate a聽salary range from a聽user鈥檚 profession and location.

Free newsletter

Sign up to The Daily

The latest on what鈥檚 new in science and why it matters each day.

New 女生小视频. Science news and long reads from expert journalists, covering developments in science, technology, health and the environment on the website and the magazine.

Some characteristics were easier for the AIs to discern than others. GPT-4 was 97.8 per cent accurate at guessing gender, but only 62.5 per cent accurate on income.

鈥淲e鈥檙e only just beginning to understand how privacy might be affected by use of LLMs,鈥 says , at the University of Surrey, UK.

Reference:

arXiv

Topics:

Sign up to our weekly newsletter

Receive a weekly dose of discovery in your inbox. We'll also keep you up to date with New 女生小视频 events and special offers.

Sign up
Piano Exit Overlay Banner Mobile Piano Exit Overlay Banner Desktop