ChatGPT Health Faces Scrutiny Over AI Triage Accuracy
OpenAI’s ChatGPT Health, launched in January 2026, has rapidly gained millions of users seeking preliminary medical guidance. However, a recent rigorous evaluation reveals significant safety concerns regarding its ability to accurately triage medical emergencies, raising questions about the readiness of AI for widespread consumer use in healthcare.
AI Triage: A Troubling Rate of Under-Triaging
A structured stress test, detailed in a study published in Nature Medicine, found that ChatGPT Health’s performance followed an “inverted U-shaped pattern.” The AI system performed relatively well with straightforward cases but faltered significantly when faced with either non-urgent presentations or genuine emergencies. Specifically, the system under-triaged 52% of emergency cases, potentially directing patients experiencing life-threatening conditions like diabetic ketoacidosis and impending respiratory failure to routine 24–48-hour evaluations instead of immediate emergency department care.
Conversely, the AI sometimes flagged minor ailments as requiring immediate attention, potentially misallocating valuable healthcare resources. The study involved 60 clinician-authored patient scenarios across 21 clinical domains and 16 different conditions, totaling 960 responses.
The Impact of Biased Information and Suicidal Ideation
The evaluation also highlighted the impact of biased information on triage recommendations. When family or friends minimized a patient’s symptoms – a phenomenon known as anchoring bias – the AI’s recommendations shifted significantly towards less urgent care (OR 11.7, 95% CI 3.7-36.6). This suggests that the AI can be unduly influenced by external factors, potentially leading to delayed or inappropriate care.
the activation of crisis intervention messages for individuals expressing suicidal ideation was unpredictable. The system was more likely to activate these messages when patients described no specific method of suicide than when they did, raising concerns about inconsistent and potentially ineffective support for those in crisis.
Demographic Factors and the Necessitate for Further Validation
Interestingly, the study found that patient race, gender, and barriers to care did not significantly affect triage recommendations. However, the researchers noted that the confidence intervals did not exclude clinically meaningful differences, indicating a need for further investigation into potential disparities.
The Role of AI in Healthcare: A Cautious Approach
While OpenAI offers a HIPAA-compliant workspace, ChatGPT for Healthcare, designed to support clinicians with tasks like drafting charts and prior authorizations, the findings from this stress test underscore the need for caution when deploying AI-powered triage systems for direct consumer use. The missed high-risk emergencies and inconsistent activation of crisis safeguards raise significant safety concerns.
As reported by World Today Journal, prospective validation is crucial before widespread consumer deployment.
Key Takeaways
- ChatGPT Health demonstrates a concerning pattern of errors in medical triage, particularly at the extremes of urgency.
- The AI under-triaged over half of emergency cases, potentially delaying critical care.
- Anchoring bias – when others downplay symptoms – significantly influenced the AI’s recommendations.
- Crisis intervention message activation was inconsistent and unpredictable.
- Further validation is essential before widespread consumer deployment of AI triage systems.