AI Ethics in Healthcare
Algorithmic Bias in Medical AI: The Problem Nobody Wants to Talk About
In 2021, Ziad Obermeyer and colleagues published a landmark analysis in Science documenting that a widely used healthcare algorithm — deployed by major U.S. health systems to identify patients for high-risk care management programs — systematically underestimated the health needs of Black patients relative to White patients at equivalent levels of objective illness. The algorithm used healthcare cost as a proxy for health need. Because Black patients historically spent less on healthcare (due to documented disparities in access and utilization), the algorithm concluded they were healthier — and assigned them lower risk scores that reduced their access to care management programs.
The Training Data Problem
Medical AI systems learn from historical data. If that historical data reflects decades of unequal access, differential treatment, and systematic exclusion from clinical research, the AI will encode those inequalities as learned patterns — and apply them at scale in ways that amplify rather than reduce healthcare disparities.
This is not a hypothetical risk. It is documented in multiple clinical AI domains:
Dermatology: A Case Study in Bias
A 2019 analysis published in Nature Medicine found that AI systems for skin cancer detection — trained predominantly on images of light-skinned patients — showed significantly lower sensitivity for detecting melanoma in darker skin tones. The training datasets used by multiple commercial and research dermatology AI systems were found to contain 80–90% images of Fitzpatrick skin types I–III (light to medium skin), despite the fact that skin cancer presentation and risk profiles differ substantially across skin types.
A 2021 follow-up study in The Lancet Digital Health found that the performance gap persisted across multiple FDA-cleared and commercially deployed dermatology AI tools, with sensitivity rates for melanoma in Fitzpatrick types V–VI (dark skin) running 15–20 percentage points below performance on lighter skin types.
Pulse Oximetry: A Bias That Cost Lives
During the COVID-19 pandemic, a problem with pulse oximetry — a technology in clinical use since the 1980s — became undeniable. Studies published in NEJM and JAMA Internal Medicine documented that standard pulse oximeters systematically overestimate blood oxygen saturation in patients with darker skin pigmentation — leading to delayed intervention for hypoxia in Black patients during COVID-19 hospitalization.
A 2022 analysis found that Black patients were three times more likely to have occult hypoxia (dangerously low blood oxygen not detected by their pulse oximeter reading) compared to White patients. The FDA issued guidance in 2021 acknowledging this limitation and calling for improved device testing across skin tones — but as of 2026, no mandatory skin-tone performance standards exist for pulse oximeters.
“Pulse oximetry bias is not a new problem — the physics were understood for decades. What COVID did was scale the clinical consequences to the point that they could not be ignored.” — NEJM editorial, 2022
Pain Assessment AI and Racial Disparities
Research published in PNAS documented that AI systems trained on facial expression data to assess patient pain systematically underestimate pain in Black patients — reflecting the documented human bias in pain assessment that multiple clinical studies have identified. Black patients receive less pain medication than White patients in emergency departments at equivalent reported pain levels. AI trained on human-generated pain assessment data learns and perpetuates this disparity.
What Regulatory Bodies Are Doing
The FDA’s Digital Health Center of Excellence has made algorithmic bias a stated priority in its AI/ML guidance development. The 2023 FDA AI/ML Software as a Medical Device Action Plan includes algorithmic bias evaluation as a component of pre-market review — but enforcement mechanisms and specific performance requirements remain under development.
The EU AI Act explicitly addresses algorithmic bias through its data governance requirements: training data used for high-risk AI systems must be representative of the populations in which the system will be deployed, and performance must be evaluated across demographic groups.
What Needs to Change
The path forward requires diversification of training datasets, mandatory sub-group performance reporting in regulatory submissions, and post-market surveillance that includes demographic performance monitoring. It also requires the uncomfortable acknowledgment that deploying AI systems trained on biased data at scale — without demographic performance validation — is an active choice that carries clinical and ethical consequences.
Sources: Science, Obermeyer et al., algorithmic bias study, 2021. Nature Medicine, dermatology AI bias, 2019. NEJM, pulse oximetry bias in COVID-19, 2022. PNAS, AI pain assessment bias, 2023. FDA Digital Health Center of Excellence AI/ML guidance, 2023.
Weekly Intelligence
Get AI Healthcare Updates Every Week
Join clinicians and researchers following AI developments that actually matter. Free. Evidence-based.