Why Missing Sex and Gender Data Is the Hidden Weakness in Clinical AI — And What Clinicians Can Do

Summary Across medical imaging, algorithmic decision support, and even large language models, recent studies show that clinical AI can learn and act on sex- and...

May 6, 2026No ratings yet13 views
Rate:

Summary

Across medical imaging, algorithmic decision support, and even large language models, recent studies show that clinical AI can learn and act on sex- and gender-related signals. At the same time, structured capture of sex, sexual orientation, and gender identity (SOGI) in electronic health records (EHRs) is often incomplete. This combination — models that implicitly encode sex or gender signals and health systems that lack reliable, interoperable data fields — undermines the ability to detect, audit and mitigate sex- and gender-based performance differences. This article summarizes the evidence, regulatory developments, and practical steps clinicians and health systems can take now to reduce risk for patients.

What the evidence shows

Imaging and diagnostic models can encode protected or socially meaningful features from routine data. Analysis of chest radiograph foundation models and downstream disease detectors found that learned image features carry information about biologic sex and race, and that subgroup true- and false-positive rates differed by sex and race in clinically relevant ways [3][4]. The authors warn that foundation models can propagate disparities into downstream tools unless developers provide subgroup performance data and transparent methods for mitigation.

Language models used for clinical reasoning are not immune. Experimental work on several general-purpose large language models found stable, model-specific skews in assigning sex in vignettes where sex was non-informative, producing downstream diagnostic differences unless models were conservatively configured or allowed to abstain [8].

Separately, retrospective analysis of EHRs shows large gaps in structured SOGI capture: one national study covering 2016–2021 reported that gender identity and sex-assigned-at-birth fields were missing for roughly three quarters of patients, while legal sex was nearly complete [5]. High missingness and inconsistent field use limit the ability to stratify algorithm performance, investigate adverse events, or satisfy regulatory expectations for subgroup analysis.

Regulatory and standards context

Regulators and standards bodies are increasingly explicit that sex- and subgroup-aware evaluation matters. The FDA’s recent final guidance on evaluation of sex-specific data in medical device clinical studies sets expectations for enrollment, analysis, reporting and labeling implications tied to sex-specific performance [1]. Separately, the FDA’s AI/ML Software as a Medical Device (SaMD) program and related guidances stress lifecycle monitoring, transparency, and the need to consider subgroup performance and post-market monitoring for systems that adapt over time [2].

Interoperability work at the Office of the National Coordinator (ONC) for Health IT and the US Core Data for Interoperability (USCDI) is also relevant: SOGI elements were added to earlier USCDI versions, and draft discussions for a “Sex Parameter for Clinical Use” aim to distinguish sex-assigned-at-birth from sex-for-clinical-use — a practical requirement for consistent capture and exchange of the right variable for a given clinical algorithm or device [9].

Why missing or ambiguous sex/gender data matters for clinicians and patients

  • Auditability: Without reliable structured fields, health systems cannot reliably stratify model performance by the appropriate sex/gender construct, limiting detection of disparate harms [5][6].
  • Clinical validity: Treating “sex” as a single, binary variable can embed incorrect physiologic assumptions and misclassify gender-diverse or intersex patients; algorithm designers and clinicians need clarity on which construct (SAB vs current anatomy/physiology) matters for a given decision [7].
  • Informed care: Patients deserve to know whether a diagnostic tool was validated across populations that match their bodies and experiences; missing data makes that communication speculative rather than evidence-based [1][2].

Practical, evidence-based steps clinicians and systems can take now

  1. Adopt standard two-step measures for gender in clinical workflows (sex assigned at birth + current gender identity) and use recommended question language from consensus guidance — these measures improve clarity and interoperability for research and safety monitoring [6][10].
  2. Improve data capture practices: use patient-centered intake, portal activation, and consistent EHR fields to reduce missingness; operational checklists from clinical trial practice translate to routine care settings [5][10].
  3. Require product-level subgroup reporting: when selecting devices or AI tools, ask vendors for sex- and gender-stratified performance data and for their planned post-market monitoring and predetermined change-control plans per FDA AI/ML recommendations [1][2].
  4. Document the variable used: for each clinical rule or model, annotate whether decisions rely on legal sex, sex assigned at birth, current anatomy, or a physiologic measure — and use the appropriate EHR field consistently [7][9].
  5. Maintain human oversight and conservative configurations: for AI or LLM-assisted reasoning, prioritize conservative model settings, require human review for uncertain cases, and log instances where sex or gender influenced model output [2][8].

Limitations and uncertainty

Evidence on how best to mitigate encoded sex/gender signals in foundation models is still evolving: resampling and multitask approaches partially reduce disparities in some imaging tasks but do not fully eliminate them [4]. EHR completeness studies reflect varied health systems and time periods; progress in interoperability (USCDI) and local data-improvement work will change auditability over the next few years [5][9]. Finally, regulatory guidances set expectations but do not substitute for system-level investment in data infrastructure and governance.

Takeaway for clinicians

Sex- and gender-related data gaps are not an abstract informatics problem — they are a practical patient-safety issue that weakens the ability to detect and fix algorithmic harms. Clinicians can press for better SOGI capture, require subgroup performance data when adopting devices and AI, and keep human oversight central to judgment in ambiguous cases. Those measures, combined with emerging regulatory expectations and interoperability work, are the immediate levers available to reduce sex- and gender-based risk as clinical AI continues to mature.

References

  1. 1.FDA: Evaluation of Sex-Specific Data in Medical Device Clinical Studies — Guidance (2025)
  2. 2.FDA: Artificial Intelligence in Software as a Medical Device (AI/ML SaMD) (2025)
  3. 3.Glocker et al.: Risk of Bias in Chest Radiography Deep Learning Foundation Models (Radiology: AI, 2023)
  4. 4.Glocker et al.: Algorithmic encoding of protected characteristics in chest X‑ray models (eBioMedicine, 2023)
  5. 5.McDowell et al.: Completeness of Sex and Gender Fields in EHRs (LGBT Health, 2025)
  6. 6.National Academies (NASEM): Measuring Sex, Gender Identity, and Sexual Orientation (2022)
  7. 7.Mohottige et al.: Considerations of sex as a binary variable in clinical algorithms (Nature Reviews Nephrology, 2024)
  8. 8.Tsintsiper et al.: Evaluating Sex Bias in Clinical Reasoning by Large Language Models (arXiv, 2026)
  9. 9.ONC / USCDI: SOGI and Sex Parameter for Clinical Use developments (USCDI / ONC resources)
  10. 10.MRCT Center: SOGI Data Collection Checklist (2024)

Join the mailing list

Get new posts from Gender Medicine Gap

Be the first to know when fresh articles are published.

No emails will be sent yet. Your signup is saved for future updates.

Comments (0)

Leave a comment

No comments yet. Be the first to comment!