For over seven decades, the Turing test was considered the gold standard for artificial intelligence: can a machine convince a human that it is actually human? It turns out it can. A groundbreaking study by UC San Diego has found that AI can seem more human than real humans in the test, and the implications are far more unsettling than you might think. The findings of the study are published in the
Proceedings of the National Academy of Sciences.
The progress no one saw coming
A new University of California, San Diego study found evidence that a modern artificial intelligence system can pass the Turing test. This particular test is a major scientific benchmark that asks whether a machine can imitate human conversation so convincingly that people cannot reliably tell it apart from a real person. It was created by Alan Turing in 1950. He was a British mathematician known as the ‘father of computer science’.
In a series of experiments, researchers found that people were unable to tell the difference between humans and advanced large language models (LLMs).
“What we found is that, if given the right prompts, advanced LLMs can exhibit the same tone, directness, humour and fallibility as humans. While we know LLMs can easily produce knowledge on nearly every topic, this test showed that they can also convincingly display social behavioural traits, which have major implications for how we think of AI,” said the study’s corresponding author, Cameron Jones, who conducted the research while earning his doctorate in Cognitive Science from UC San Diego.
The study
During the test, a participant chats simultaneously with two other parties — one human and the other an LLM. The human ‘interrogator’ must decide which party is the human. During the experiments, participants chatted with four different LLMs — GPT-4.5 and LLaMa-3.1-405B (state-of-the-art models). There were also older baseline models such as GPT-4o and ELIZA, a classic 1960s rules-based chatbot, for comparison.
Across the four LLMs, GPT-4.5 was identified as human 73% of the time. LLaMa-3.1-405B was judged human 56% of the time. Baseline models performed worse — ELIZA and GPT-4o were selected as human only about 23% and 21% of the time overall.
“The Turing test started as a way to ask whether machines could rival human intelligence. But now we know AI can answer many questions faster and more accurately than people can, so the real issue is not raw brainpower. Seeing that machines can pass the test — and seeing how they pass it — forces us to rethink what it measures. Increasingly, it is measuring humanlikeness,” said study co-author Ben Bergen, a professor of cognitive science at UC San Diego.
The LLMs were given a ‘persona’ prompt to adopt a specific human character and communication style. Bergen explained that the LLMs were not winning through knowledge, but by making mistakes like humans. “These traits are not the kinds of maths and logic problem-solving intelligence that I think Turing was imagining.”
The prompts matter
The researchers also noted that the models were less likely to be mistaken for humans if explicit instructions were not provided. For example, GPT-4.5 fell to a 36% win rate and LLaMa-3.1 to 38%, while baseline systems ELIZA (23%) and GPT-4o (21%) were chosen as human even less often.
However, the same systems could pass as human when detailed instructions on what character they should play were provided. This suggests that while the models can behave in convincingly human ways, they often need humans to tell them how.
“They have the ability to appear human-like, but perhaps not as much the ability to figure out what it would take to appear human-like,” Bergen said.
Trust, deception, and the rise of ‘counterfeit people’
The findings of this study matter because it is becoming harder to trust people online. AI can now act as a human in a five- or 15-minute conversation — and that is both exciting and horrifying. “It is relatively easy to prompt these models to be indistinguishable from humans. We need to be more alert; when you interact with strangers online, people should be much less confident that they know they are talking to a human rather than an LLM,” said Jones, who is now an assistant professor of psychology at Stony Brook University.
Elaborating on the darker risks, he added, “The Turing test is a game about lying for the models. One of the implications is that models seem to be really good at that.”
Bergen said that being unable to discern whether you are interacting with a human or a bot can have serious consequences. “There are lots of people who would like to use bots to persuade people to share their social security numbers, vote for their party, or buy their product,” he said.
Follow Us On Social Media