A Harvard study tested large language models on real emergency room cases and found at least one model outperformed human doctors in diagnostic accuracy. The research examines LLM performance across a range of medical contexts, raising questions about AI's potential role in clinical settings.