Advanced AI Models Collapse on Classic Psychology Test, Raising Questions About Human-Level AI

The machines that can do so much remain unable to do something a child learns naturally.
Advanced AI systems excel at complex tasks but collapse on basic attention tests, revealing a fundamental cognitive gap.

For all their remarkable fluency with language, code, and complex reasoning, the latest generation of AI systems — including GPT-5 — have stumbled on one of psychology's most enduring measures of basic cognition: the sustained attention test. Researchers found not a gradual decline but a near-total collapse in performance as cognitive demands increased, revealing that the ability to hold focus over time may rest on principles fundamentally different from those powering today's most capable models. The finding invites a quieter, more unsettling question beneath the headlines about AI progress: how much of what we call intelligence depends not on raw capability, but on the humble, persistent act of paying attention?

  • AI systems celebrated for writing code and analyzing medical images nearly fell apart when asked to sustain focus across longer, more demanding attention sequences — a task human children master naturally.
  • The failure was not gradual but a sharp collapse, suggesting a structural gap in how current AI architectures direct and hold cognitive resources rather than a simple deficit in processing power.
  • Researchers now suspect the path to artificial general intelligence is blocked not by scale alone, but by unresolved architectural limitations that bigger models and more training data cannot simply overcome.
  • AI labs and technology companies tracking AGI timelines are already recalibrating their expectations, with some calling for fundamentally new design approaches rather than incremental improvements.
  • The findings reframe the entire debate around what separates today's AI from human-level general intelligence, shifting focus from what machines can do to what they cannot yet sustain.

The latest AI systems can synthesize research, analyze medical imaging, and generate fluent code — yet when researchers administered classic psychology attention tests developed in the mid-twentieth century, the results were startling. Models including GPT-5 did not merely underperform; they collapsed. The kind of sustained, filtered focus that humans apply when reading a long document or tracking a conversation proved, under increasing cognitive demand, to be well beyond their reach.

The pattern was consistent and revealing. At modest levels of complexity, the systems held their own. But as tests required models to maintain attention across longer sequences and filter irrelevant information, performance degraded sharply — not gradually, but in sudden breakdown. This exposed something important: these systems excel at parallel pattern-matching and well-defined problem-solving, but sustained attention appears to operate on entirely different principles, ones the current generation has not yet developed.

The implications reach further than a single benchmark. Many researchers had assumed the distance between today's AI and artificial general intelligence was primarily a matter of scale — more data, larger models, greater compute. These results challenge that assumption directly, pointing instead to fundamental architectural gaps that scaling alone cannot close. If attention mechanisms represent an unsolved core capability, the timeline for human-level AI may require not just adjustment but a rethinking of how these systems are built.

What lingers in the findings is something almost philosophical: machines that can do so much remain unable to do something a child learns without instruction. That quiet failure may be telling us more about the nature of intelligence itself than any benchmark these systems have managed to pass.

The latest generation of artificial intelligence systems can write code, analyze medical imaging, and synthesize research across disciplines. Yet when researchers sat them down to take a test that has been measuring human attention for decades, something unexpected happened: the machines nearly fell apart.

Advanced AI models, including GPT-5, performed at near-total collapse levels on classic psychology tests designed to measure sustained attention—the kind of focused concentration that a person uses when reading a long document, following a conversation, or tracking a moving object. The tests themselves are not new. Psychologists have been using them since the mid-twentieth century to understand how human minds filter information and maintain focus over time. What made the recent results striking was not that the AI systems failed, but how completely they failed, and what that failure suggests about the gap between current machine intelligence and the kind of general reasoning humans take for granted.

The pattern that emerged from the testing was revealing. When cognitive demands remained modest, the AI systems performed adequately. But as the tests increased in complexity—as they required the models to sustain attention across longer sequences, to filter out irrelevant information, and to maintain focus on a primary task—performance degraded sharply. This was not a gradual decline. It was a collapse. The systems that had demonstrated remarkable capability in other domains suddenly struggled with what should have been a straightforward cognitive task.

What makes this finding significant is what it exposes about the nature of current AI capabilities. These systems excel at pattern matching, at processing vast amounts of information in parallel, at generating fluent text and solving well-defined problems. But sustained attention—the ability to maintain focus on a single thread of thought while filtering out distractions—appears to operate on different principles. It requires something the current generation of models has not yet developed: a mechanism for deliberately directing cognitive resources toward a chosen target and keeping them there.

The implications ripple outward. Researchers and industry observers have long debated what separates current AI from artificial general intelligence—the hypothetical point at which machines would match human cognitive flexibility across all domains. Many assumed the remaining gaps were primarily about scale: bigger models, more training data, more computational power. These results suggest otherwise. They indicate that fundamental architectural limitations may be at play, limitations that cannot be solved simply by making existing systems larger or faster.

The attention test results have already begun to reshape conversations in AI research labs and among technology companies tracking progress toward AGI. If attention mechanisms represent a core cognitive capability that current approaches have not adequately solved, then the timeline for achieving human-level artificial general intelligence may need to be recalibrated. The path forward would require not just incremental improvements but potentially new approaches to how AI systems are designed and trained.

For now, the machines that can do so much remain unable to do something that a human child learns to do naturally. That gap, small as it might seem, may be telling us something important about what we still do not understand about intelligence itself.

The collapse occurs as cognitive demands increase, indicating AI struggles with sustained focus rather than raw computational power.
— Research findings on AI attention mechanisms
A Conversa do Hearth Outra perspectiva sobre a história
Inventor

So these are systems that can do genuinely sophisticated work—medical analysis, coding, research synthesis. How do we square that with failing a basic attention test?

Model

The key is that those sophisticated tasks don't necessarily require sustained attention in the way the psychology test does. A model can analyze an image or generate code by processing patterns it has learned. But holding focus on a single thread while filtering noise—that's a different operation entirely.

Inventor

Is this a surprise to the researchers, or was this expected?

Model

The collapse itself seems to have been sharper than anticipated. People knew attention was a challenge, but the near-total failure at higher cognitive loads suggests something more fundamental is broken in how these systems are built.

Inventor

Does this mean we're further from AGI than we thought?

Model

It suggests the path is different than we thought. We might have been assuming the remaining gaps were about scale—just bigger, faster systems. This says no, there's an architectural problem that size alone won't fix.

Inventor

What would fixing it look like?

Model

That's the open question. It might require rethinking how these models direct their own computational resources, or how they maintain state over long sequences. It's not a tweak. It's potentially a redesign.

Inventor

And if it can't be fixed?

Model

Then we're looking at machines that are very good at specific tasks but may never achieve the kind of flexible, general reasoning that humans have. That's a very different future than the one many people have been imagining.

Quer a matéria completa? Leia o original em Google News ↗
Fale Conosco FAQ