In two recent medical studies, AI either matched or surpassed the performance of doctors.
Two AI systems have matched, and in some cases surpassed, doctors in diagnosing patients and formulating treatment plans. However, it should be noted that none of the patients involved were real. The findings, published in Nature this week, provide strong evidence that specialized medical AI is advancing towards the capabilities of human clinicians. They also exemplify the distinction between eye-catching headlines and the realities of clinical practice.
Findings from the studies
The first AI system, Mira, was developed by academic researchers in Germany. When given access to a simulated medical record, it can select from over 85,000 actions, including tests, prescriptions, and hospital admissions. In over 500 emergency department cases, it achieved a diagnostic accuracy of approximately 87 percent, compared to 78 percent for a panel of six doctors. It performed best on conditions with clear test results, such as pancreatitis and appendicitis.
The second system, Amie, created by Google, is based on its Gemini model. When tested against 21 UK general practitioners in 100 multi-visit cases, it matched their clinical reasoning and generated treatment plans that aligned more closely with official guidelines. In a benchmark for challenging medication decisions, it outperformed the GPs.
Why the headline is misleading
A closer look reveals a more nuanced picture. Both systems were evaluated on simulated patients, using clean, text-only case notes. There were no physical examinations, imaging scans, or assessments of a patient's tone or body language, all of which are critical components of real medical practice.
Independent experts have raised additional concerns. Amie was rewarded for adhering to guidelines, which doctors are not required to follow strictly, creating an uneven comparison. Mira ordered nearly twice as many tests as the doctors, and ordering more tests can artificially inflate an accuracy score. Additionally, the models tested are already outdated, being around two years old, which the researchers suggest may make them less effective than current alternatives.
Autopilot, not replacement
The researchers are cautious about what these results imply. Jakob Kather, co-developer of Mira, likens the AI to an aircraft’s autopilot: it can manage routine tasks, but “ultimate responsibility will always remain with the physicians.” This is likely the direction we are heading, and it is already unfolding.
AI is being integrated into actual healthcare systems to address workforce shortages, reduce administrative burdens, and is being promoted to patients as tools for consumer health advice. The Nature studies do not indicate that doctors are becoming obsolete; rather, they demonstrate that, in a simulation, machines can now reason similarly to doctors, which is impressive but still far from replicating real hospital environments.
Other articles
In two recent medical studies, AI either matched or surpassed the performance of doctors.
In studies published in Nature, Germany's Mira and Google's Amie, two AI systems, either matched or surpassed doctors in diagnosis and treatment; however, this was only demonstrated in simulated environments.
