ChatGPT Health Faces Scrutiny Over Medical Triage Accuracy
OpenAI’s ChatGPT Health, launched in January 2026, is under fire following a study published in Nature Medicine revealing significant inaccuracies in its medical triage recommendations. The study, which tested the AI’s ability to assess the severity of medical cases, found that the chatbot frequently underestimated the urgency of emergencies and exhibited inconsistencies in identifying suicidal ideation.
Study Highlights Critical Flaws
Researchers evaluated ChatGPT Health using 60 clinician-authored medical scenarios, each with 16 variations, totaling 960 responses. The findings indicate that the AI “under-triaged” 51.6% of emergency cases, suggesting less urgent care – such as a doctor’s appointment within 24 to 48 hours – when immediate emergency room attention was required. Specifically, the AI failed to recommend emergency care for conditions like diabetic ketoacidosis and impending respiratory failure, both potentially life-threatening situations.
Conversely, ChatGPT Health over-triaged 64.8% of non-urgent cases, recommending a doctor’s visit when at-home care would have sufficed. For example, a patient experiencing a three-day sore throat was advised to seek medical attention within 24 to 48 hours.
Inconsistent Crisis Intervention
The study also revealed unpredictable behavior in crisis intervention. Even as ChatGPT Health is designed to direct individuals expressing suicidal intent to the 988 Suicide & Crisis Lifeline, the AI inconsistently activated this safeguard, sometimes referring users unnecessarily and failing to do so when needed. Researchers described the bot’s performance as “paradoxical” and “inverted to clinical risk.”
Demographic Factors Show No Significant Impact
Interestingly, the study found no significant differences in triage recommendations based on patient race or gender. But, researchers noted that the confidence intervals did not entirely rule out the possibility of clinically meaningful differences.
OpenAI’s Response and Ongoing Development
OpenAI acknowledged the research and stated that the study may not fully reflect how ChatGPT Health is intended to be used. A spokesperson emphasized that the chatbot is designed for users to provide follow-up information and context, rather than relying on a single response to a medical scenario. The company also highlighted that ChatGPT Health is currently available to a limited number of users and is undergoing continuous improvement to enhance its safety and reliability.
Expert Concerns and Future Implications
Dr. Ashwin Ramaswamy, lead author of the study and an instructor of urology at The Mount Sinai Hospital, cautioned against relying on AI for emergency medical advice. He emphasized the importance of using AI as a supplement to, rather than a replacement for, a physician’s expertise. Other experts echo this sentiment, stressing the need for rigorous testing before deploying AI-powered triage systems on a large scale.
With over 40 million people globally using ChatGPT for health-related questions and nearly 2 million weekly messages concerning insurance, the potential impact of inaccurate triage recommendations is substantial. As AI continues to integrate into healthcare, ensuring patient safety and accuracy remains paramount.
Key Takeaways
- ChatGPT Health demonstrates significant inaccuracies in medical triage, under-triaging over half of emergency cases.
- The AI exhibits inconsistent behavior in identifying and responding to suicidal ideation.
- OpenAI is actively working to improve the safety and reliability of ChatGPT Health.
- Experts emphasize the importance of using AI as a supplement to, not a replacement for, professional medical advice.