Loading…

Risk of Bias in Chest Radiography Deep Learning Foundation Models

To analyze a recently published chest radiography foundation model for the presence of biases that could lead to subgroup performance disparities across biologic sex and race. This Health Insurance Portability and Accountability Act-compliant retrospective study used 127 118 chest radiographs from 4...

Full description

Saved in:
Bibliographic Details
Published in:Radiology. Artificial intelligence 2023-11, Vol.5 (6), p.e230060-e230060
Main Authors: Glocker, Ben, Jones, Charles, Roschewitz, Mélanie, Winzeck, Stefan
Format: Article
Language:English
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:To analyze a recently published chest radiography foundation model for the presence of biases that could lead to subgroup performance disparities across biologic sex and race. This Health Insurance Portability and Accountability Act-compliant retrospective study used 127 118 chest radiographs from 42 884 patients (mean age, 63 years ± 17 [SD]; 23 623 male, 19 261 female) from the CheXpert dataset that were collected between October 2002 and July 2017. To determine the presence of bias in features generated by a chest radiography foundation model and baseline deep learning model, dimensionality reduction methods together with two-sample Kolmogorov-Smirnov tests were used to detect distribution shifts across sex and race. A comprehensive disease detection performance analysis was then performed to associate any biases in the features to specific disparities in classification performance across patient subgroups. Ten of 12 pairwise comparisons across biologic sex and race showed statistically significant differences in the studied foundation model, compared with four significant tests in the baseline model. Significant differences were found between male and female ( < .001) and Asian and Black ( < .001) patients in the feature projections that primarily capture disease. Compared with average model performance across all subgroups, classification performance on the "no finding" label decreased between 6.8% and 7.8% for female patients, and performance in detecting "pleural effusion" decreased between 10.7% and 11.6% for Black patients. The studied chest radiography foundation model demonstrated racial and sex-related bias, which led to disparate performance across patient subgroups; thus, this model may be unsafe for clinical applications. Conventional Radiography, Computer Application-Detection/Diagnosis, Chest Radiography, Bias, Foundation Models Published under a CC BY 4.0 license.See also commentary by Czum and Parr in this issue.
ISSN:2638-6100
2638-6100
DOI:10.1148/ryai.230060