ChatGPT-4 Vision Fails Skin Disease Diagnosis, Especially on Darker Skin | Study

by Anika Shah - Technology March 18, 2026

March 18, 2026 0 comments

ChatGPT-4 Vision Falls Short in Skin Disease Diagnosis, Especially for Darker Skin Tones

A recent study published in SKIN: The Journal of Cutaneous Medicine reveals significant limitations in ChatGPT-4 Vision’s ability to accurately diagnose skin conditions from images alone, particularly in patients with darker skin tones. The findings raise concerns about the reliability of AI-driven visual diagnosis and highlight the potential for exacerbating existing healthcare disparities.

Study Methodology and Findings

Researchers evaluated ChatGPT-4 Vision using a dataset of 150 images representing the 15 most common inpatient skin diseases. The dataset was carefully balanced, with 75 images depicting patients with light skin and 75 with darker skin. The AI was tasked with either providing the correct primary diagnosis or including it within its top three differential diagnoses. The study focused solely on image recognition, excluding any textual input.

The results were concerning. For light-skinned patients, ChatGPT-4 Vision correctly identified the primary diagnosis in 57.3% of cases. But, its accuracy dropped to 42.7% for patients with darker skin. Even when considering the top three diagnoses, the success rate remained below 75% for both skin tones.

The AI struggled particularly with complex conditions like cutaneous lymphomas and fungal infections.

The Problem of Bias in Training Data

The performance gap between skin tones underscores a critical issue in medical AI: biased training data. Experts believe that datasets used to train models like ChatGPT-4 Vision often contain a disproportionately large number of images of light-skinned individuals. This imbalance can lead to AI systems that are less accurate when analyzing images of people with darker skin.

certain dermatological symptoms, such as redness, can be more difficult to visually detect on darker skin tones. Without sufficient exposure to diverse imagery during training, AI models may fail to recognize these subtle cues, leading to misdiagnosis.

Visual Diagnosis vs. Text-Based Diagnosis

Interestingly, previous studies have shown that AI models performing text-based diagnoses can achieve accuracy rates of up to 90%. This difference highlights the complexities of visual pattern recognition in dermatology, which requires a level of clinical experience that cannot be easily replicated by AI based on text patterns or general image data alone. While multimodal models represent advancements in image recognition, they are still in their early stages of development for use in the critical field of medicine.

The Future of AI in Dermatology: Assistance, Not Autonomy

The study suggests a shift in focus for AI applications in dermatology. Rather than striving for fully autonomous diagnostic systems, the emphasis is now on developing assistive tools that can support clinicians. These specialized models will be trained using high-quality, diversified medical image data and will be designed to aid in differential diagnoses or provide second opinions.

However, experts caution that it will likely be several years before such systems are ready for widespread clinical use. For the foreseeable future, the expertise of human dermatologists remains the gold standard in patient care.

Key Takeaways

ChatGPT-4 Vision demonstrates limited accuracy in diagnosing skin conditions from images.
The AI performs significantly worse on images of patients with darker skin tones.
Biased training data is a major contributing factor to the observed disparities.
The future of AI in dermatology lies in assistive tools that support, rather than replace, human clinicians.

Sources:

A Comparison of ChatGPT-4 Vision’s Diagnostic Accuracy, SKIN: The Journal of Cutaneous Medicine

Study Finds ChatGPT-4 Vision Not Yet Reliable for Diagnosing Inpatient Skin Conditions Across Skin Tones, Knox News

chatgpt 4 Diagnose high Hit rate skin cancer diagnosis Skin diseases standards study Vision weaken

ChatGPT-4 Vision Fails Skin Disease Diagnosis, Especially on Darker Skin | Study

ChatGPT-4 Vision Falls Short in Skin Disease Diagnosis, Especially for Darker Skin Tones

Study Methodology and Findings

The Problem of Bias in Training Data

Visual Diagnosis vs. Text-Based Diagnosis

The Future of AI in Dermatology: Assistance, Not Autonomy

Key Takeaways

Quebec Specialists Put Santé Québec on Notice Over “Intimidation” Tactics

Ella Langley & Megan Moroney: Country Music’s Female Firsts on Billboard Charts

Related Posts

Leave a Comment Cancel Reply