Scientists Evaluated AI for Detecting Cognitive Decline. The Findings Were Surprising

Scientists Evaluated AI for Detecting Cognitive Decline. The Findings Were Surprising

It’s been just under two years since OpenAI introduced ChatGPT to the public, allowing anyone online to collaborate with an AI on tasks ranging from poetry and schoolwork to drafting letters for their landlord.
Credit: Pixabay

It’s been just under two years since OpenAI introduced ChatGPT to the public, allowing anyone online to collaborate with an AI on tasks ranging from poetry and schoolwork to drafting letters for their landlord.

Today, ChatGPT is just one of several advanced large language models (LLMs) capable of responding to basic queries in a way that feels remarkably human.

However, researchers in Israel have discovered that this human-like quality may go further than intended—finding that LLMs experience a form of cognitive decline that worsens over time, much like the aging human brain.

The team tested publicly available chatbots, including ChatGPT versions 4 and 4o, two iterations of Alphabet’s Gemini, and version 3.5 of Anthropic’s Claude, using a series of cognitive assessments.

If these models were truly intelligent, the results would be alarming.

Researchers Identify Cognitive Decline in AI Models, Drawing Parallels to Human Neurodegeneration

In their published study, neurologists Roy Dayan and Benjamin Uliel from Hadassah Medical Center, along with data scientist Gal Koplewitz from Tel Aviv University, describe a level of cognitive deterioration comparable to neurodegenerative processes in the human brain.

Despite their conversational fluency, LLMs function more like predictive text systems than biological brains that actively generate knowledge. While their statistical approach enables rapid and personable responses, it also makes them highly susceptible to misinformation—struggling to distinguish fact from fiction.

To be fair, human cognition isn’t flawless either. But as AI takes on increasingly critical roles, from medical guidance to legal advice, expectations have risen that each new generation of LLMs will become better at reasoning about the information they generate.

To evaluate the gap between current AI capabilities and human cognition, the researchers subjected these models to a battery of tests, including the Montreal Cognitive Assessment (MoCA)—a tool commonly used by neurologists to assess memory, spatial awareness, and executive function.

AI Cognitive Assessment Reveals Varying Levels of Impairment Across Models

ChatGPT 4o achieved the highest score on the assessment, earning 26 out of 30 points, which falls within the range of mild cognitive impairment. ChatGPT 4 and Claude followed closely with 25 points, while Gemini lagged significantly behind with just 16 points—a score that, in humans, would indicate severe impairment.

Comparisons of five LLM MoCA scores. (Dayan et al., BMJ, 2025)

A closer look at the results reveals that all models struggled with visuospatial and executive function tasks.

Tasks such as trail-making, replicating a simple cube design, and drawing a clock proved especially challenging for the LLMs, with most either failing outright or requiring detailed instructions to complete them.

Attempts to draw a Necker cube (top left) by a human (top right) and ChatGPT versions 4 (bottom left) and 4o (bottom right). (Dayan et al., BMJ, 2025)

AI Models Display Dementia-Like Responses in Spatial Awareness Tests

Some responses regarding spatial awareness resembled those given by dementia patients. For example, Claude answered, the specific place and city would depend on where you, the user, are located at the moment.”

Similarly, all models displayed a lack of empathy in a section of the Boston Diagnostic Aphasia Examination, a trait often linked to frontotemporal dementia.

As expected, older LLM versions performed worse than newer ones, suggesting that each generation improves upon the cognitive limitations of its predecessors.

The researchers acknowledge that LLMs are not human brains, making it impossible to diagnose them with dementia. However, their findings challenge the assumption that AI is on the brink of revolutionizing clinical medicine, a field that often depends on interpreting complex visual information.

With AI development progressing rapidly, a future LLM may eventually achieve perfect scores on cognitive assessments. Until then, even the most advanced models should be approached with caution when offering advice.


Read the original article on: Science Alert

Read more: Human Minibrains Sent to Space Thrived in an Unexpected Manner

Share this post

Leave a Reply