A recent study published in The BMJ highlights a concerning issue with popular AI chatbots. These tools, often lauded for their potential in medical diagnostics, exhibit signs of mild cognitive impairment, akin to early dementia. The findings question their reliability for clinical use. The study tested several large language models (LLMs), including ChatGPT 4 and 4o, Claude 3.5 “Sonnet,” and Gemini 1.0 and 1.5. Studies reveal that AI chatbots show signs of cognitive decline when assessed using cognitive tests like the Montreal Cognitive Assessment (MoCA). Researchers used the MoCA test, a tool widely applied to detect cognitive impairment in humans. ChatGPT 4o scored the highest with 26 out of 30, barely meeting the normal threshold. In contrast, Gemini 1.0 scored the lowest with 16.
Older chatbot versions performed worse, mimicking the cognitive decline observed in aging humans. These results challenge the idea that AI will soon replace human doctors.
Testing Methodology
The MoCA test evaluates abilities such as memory, attention, language, visuospatial skills, and executive functions. The instructions for chatbots mirrored those given to human patients, and a neurologist scored the results.
While chatbots excelled in tasks like naming, attention, language, and abstraction, their performance in visuospatial and executive tasks was weak. For example, tasks like drawing a clock face or connecting encircled numbers and letters in sequence proved challenging for all models.
Specific Failures in AI Models
Researchers observed that AI chatbots show signs of cognitive decline in visuospatial and executive tasks, such as drawing clock faces or connecting sequences. Gemini models particularly struggled with a delayed recall task, failing to remember a five-word sequence. None of the chatbots could accurately interpret complex visual scenes or show empathy in visuospatial tests.
Interestingly, ChatGPT 4o was the only model to succeed in the Stroop test’s incongruent stage. This test measures interference in reaction times using mismatched color names and fonts.
Implications for Medical Use
The findings underscore the limitations of current AI technology in clinical settings. Despite advancements, AI chatbots show signs of cognitive decline, raising doubts about their ability to replace human physicians. Poor performance in visuospatial and executive functions could hinder their effectiveness in medical diagnostics.
Researchers noted that these observations highlight fundamental differences between human cognition and AI capabilities. They also raised concerns that AI chatbots might soon require their own form of “treatment” for virtual cognitive impairments.
The study concludes that neurologists are unlikely to be replaced by AI chatbots in the foreseeable future. Instead, the focus may shift toward addressing the cognitive limitations of these models. This research emphasizes the need for caution when integrating AI into critical fields like medicine.
Cognitive Gaps and Their Implications
AI chatbots like ChatGPT 4o and Gemini 1.0 struggled in areas requiring visuospatial skills and executive functions. Tasks such as drawing a clock face or connecting sequences tested their ability to reason abstractly, a skill critical for real-world problem-solving. These failures suggest that while AI can handle structured data well, it falters in situations requiring creativity, context, or non-linear thinking.
The lack of empathy and inability to interpret complex visual scenes further highlight the models’ deficiencies. In medical practice, empathy is essential for building trust and ensuring accurate patient care. Without this quality, reliance on AI could risk reducing medicine to a mechanical process, overlooking the human element that underpins effective diagnosis and treatment.
Broader Challenges and Ethical Concerns
The study also highlights the issue of anthropomorphizing AI. Viewing chatbots as human-like entities creates unrealistic expectations. While they process information quickly, their inability to adapt or innovate independently limits their utility in complex, dynamic fields like medicine.
Additionally, the uniform failure of AI models in executive tasks raises ethical questions. Should these tools be deployed in sensitive areas like healthcare without addressing such flaws? Overreliance on AI may lead to errors, eroding patient confidence and trust in medical systems. The findings also emphasize the need to balance the excitement around AI with rigorous testing and transparency to ensure safety and effectiveness.
Also Read: Korea Will Use AI to Purge Piracy Streaming Sites in Bold New Strategy.