AI Models Struggle to Detect Medical Misinformation in Clinical Contexts, Study Finds

AI Models Struggle to Detect Medical Misinformation in Clinical Contexts, Study Finds

Christina Sanchez
Christina Sanchez
2 Min.
Three men in a hospital room, one lying on a bed and two standing, with medical equipment and a wall in the background.

AI Models Struggle to Detect Medical Misinformation in Clinical Contexts, Study Finds

A new study has revealed serious flaws in how AI models handle medical misinformation. Researchers tested 20 language models with over 3.4 million queries, finding that even advanced systems can be fooled by false claims—especially when presented in clinical formats. The results, published in The Lancet Digital Health, raise concerns about AI safety in healthcare settings.

The Oxford-led research assessed three prominent models—GPT-4o, Llama 3, and Command R —alongside 17 others. Each was exposed to fabricated medical claims embedded in different contexts: clinical notes, social media posts, and case vignettes. The study focused on ten types of logical fallacies to see how phrasing affected detection rates.

Context played a critical role. Models rejected false claims 70% of the time when they appeared in Reddit-style misinformation. But when the same lies were hidden in clinical notes, acceptance rates soared to 46.1%. GPT-4o performed best, accepting only 10.6% of falsehoods, while some medical-specific models fared worse than general-purpose ones.

The findings also showed that fallacious justifications could either help or hinder detection. Most fallacies made models more sceptical, but two types actually increased their likelihood of error. Surprisingly, models fine-tuned on medical data were not inherently more resistant to hallucinations than broader AI systems.

Test data included real discharge summaries altered with false information, alongside social media myths and simulated patient cases. The goal was to mimic how misinformation might appear in real-world clinical workflows.

The study underscores a major risk: AI tools in medicine could unknowingly adopt false claims, particularly when presented in professional formats. Without strict safeguards, these errors might seep into patient summaries, diagnoses, or treatment recommendations. Researchers warn that reliance on current models—even specialised ones—demands caution until detection capabilities improve.

Neueste Nachrichten