In two recent medical studies, AI either matched or surpassed the performance of doctors.

      Two AI systems have matched, and in some cases surpassed, doctors in diagnosing patients and formulating treatment plans. However, it should be noted that none of the patients involved were real. The findings, published in Nature this week, provide strong evidence that specialized medical AI is advancing towards the capabilities of human clinicians. They also exemplify the distinction between eye-catching headlines and the realities of clinical practice.

      Findings from the studies

      The first AI system, Mira, was developed by academic researchers in Germany. When given access to a simulated medical record, it can select from over 85,000 actions, including tests, prescriptions, and hospital admissions. In over 500 emergency department cases, it achieved a diagnostic accuracy of approximately 87 percent, compared to 78 percent for a panel of six doctors. It performed best on conditions with clear test results, such as pancreatitis and appendicitis.

      The second system, Amie, created by Google, is based on its Gemini model. When tested against 21 UK general practitioners in 100 multi-visit cases, it matched their clinical reasoning and generated treatment plans that aligned more closely with official guidelines. In a benchmark for challenging medication decisions, it outperformed the GPs.

      Why the headline is misleading

      A closer look reveals a more nuanced picture. Both systems were evaluated on simulated patients, using clean, text-only case notes. There were no physical examinations, imaging scans, or assessments of a patient's tone or body language, all of which are critical components of real medical practice.

      Independent experts have raised additional concerns. Amie was rewarded for adhering to guidelines, which doctors are not required to follow strictly, creating an uneven comparison. Mira ordered nearly twice as many tests as the doctors, and ordering more tests can artificially inflate an accuracy score. Additionally, the models tested are already outdated, being around two years old, which the researchers suggest may make them less effective than current alternatives.

      Autopilot, not replacement

      The researchers are cautious about what these results imply. Jakob Kather, co-developer of Mira, likens the AI to an aircraft’s autopilot: it can manage routine tasks, but “ultimate responsibility will always remain with the physicians.” This is likely the direction we are heading, and it is already unfolding.

      AI is being integrated into actual healthcare systems to address workforce shortages, reduce administrative burdens, and is being promoted to patients as tools for consumer health advice. The Nature studies do not indicate that doctors are becoming obsolete; rather, they demonstrate that, in a simulation, machines can now reason similarly to doctors, which is impressive but still far from replicating real hospital environments.

Other articles

Guardrails Alliance: a $5 million political action committee for tech workers taking on Big Tech. The Guardrails Alliance, a super PAC for tech workers and unions with a budget of $5 million, has been established to challenge the over $100 million in funding from Silicon Valley supporting pro-AI candidates.

Big Tech claimed that AI would replace your job. Now employers are saying otherwise. Jeff Bezos predicts that AI will lead to a shortage of labor, while Sam Altman expresses his pleasure in being mistaken about job losses. The layoffs and the timing of the IPO add complexity to the situation.

The global consensus is that China is leading in the AI competition. A Public First survey of 18,000 individuals across 15 nations revealed that a majority now perceives China as surpassing the US in AI capabilities, although China remains behind in terms of trust.

A city halted the establishment of AI data centers, while Amazon investigated its engineers. Three Amazon engineers report that they were subjected to an investigation following their support for limits on Seattle's data centers. The opposition to the AI expansion has now gained bipartisan support and has become an internal matter.

Gemini Live can now retain information from previous conversations. Gemini Live can now remember information from earlier conversations, bridging a gap that has existed since the introduction of memory in standard Gemini over a year ago.

AI cheating tools are succeeding. The focus isn't on detection. AI humanizers and autotypers have now outperformed the detectors designed to identify cheaters. The core issue lies not in the tools themselves, but in what schools decide to evaluate.

In two recent medical studies, AI either matched or surpassed the performance of doctors.

In studies published in Nature, Germany's Mira and Google's Amie, two AI systems, either matched or surpassed doctors in diagnosis and treatment; however, this was only demonstrated in simulated environments.