GPT-4.5 fools humans in turing test

A U.S. study found that OpenAI's GPT-4.5 passed the Turing Test, with 73% of participants mistaking it for a human—suggesting the AI convincingly mimics human-like intelligence. Researchers, however, caution this doesn't mean the model truly "thinks" like a human.

Agencies and A News TECH

Published April 16,2025

Subscribe

A new study in the United States has revealed that an AI chatbot passed a critical test suggesting it may possess human-like intelligence.

Conducted by two researchers from the University of California, San Diego, the study—yet to be peer-reviewed—tested four large language models (LLMs) using the Turing Test.

As a result, OpenAI's GPT-4.5 model was indistinguishable from a human by over 70% of participants.

WHAT IS THE TURING TEST?

Proposed by British mathematician and computer scientist Alan Turing in 1950, the Turing Test is an experiment designed to determine whether a machine can think like a human.

In the test, a human interrogator communicates via written messages with two subjects—one human, one AI—and tries to identify the human. If they cannot reliably do so, the AI is considered to have "human-like intelligence."

WHAT DID THE STUDY FIND?

In the preliminary study published in March, four AI models—ELIZA, GPT-4o, LLaMa-3.1-405B, and GPT-4.5—were tested. Each of the 284 participants engaged in five-minute conversations with two "witnesses," one human and one AI.

Participants mistook GPT-4.5 for a human 73% of the time. LLaMa-3.1-405B followed at 56%. ELIZA and GPT-4o scored much lower, at 23% and 21%, respectively.

CRITICISM OF THE TURING TEST

Though long considered a symbolic benchmark for AI intelligence, the Turing Test has faced increasing scrutiny. Critics highlight four main concerns:

Behavior vs. Thought: It evaluates outward behavior, not actual thinking capacity.
Is the brain a machine?: Turing's mechanistic view of the brain is still debated.
Different operations: Since machines and humans function differently, comparisons may be flawed.
Limited scope: Evaluating one type of behavior isn't enough to measure general intelligence.

DOES IT REALLY THINK LIKE A HUMAN?

The study's authors acknowledge that while GPT-4.5 passed the test, this doesn't mean it possesses human intelligence. Rather, it merely succeeded in "appearing human."

They also note that short interaction times and the use of specific AI "personalities" may have influenced results.

Experts agree GPT-4.5 isn't yet as intelligent as humans but has demonstrated a convincing ability to imitate human conversation in some cases.