Loading…

Gradient boosted trees with individual explanations: An alternative to logistic regression for viability prediction in the first trimester of pregnancy

•Interpretable machine learning models efficiently predict miscarriages.•Gradient boosted trees algorithm offers serious assets for clinical modeling.•Clinical predictive models’ interpretation can be improved with decision paths.•Post-hoc interpretably provides straightforward explanations for phys...

Full description

Saved in:
Bibliographic Details
Published in:Computer methods and programs in biomedicine 2022-01, Vol.213, p.106520-106520, Article 106520
Main Authors: Vaulet, Thibaut, Al-Memar, Maya, Fourie, Hanine, Bobdiwala, Shabnam, Saso, Srdjan, Pipi, Maria, Stalder, Catriona, Bennett, Phillip, Timmerman, Dirk, Bourne, Tom, De Moor, Bart
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•Interpretable machine learning models efficiently predict miscarriages.•Gradient boosted trees algorithm offers serious assets for clinical modeling.•Clinical predictive models’ interpretation can be improved with decision paths.•Post-hoc interpretably provides straightforward explanations for physicians. Clinical models to predict first trimester viability are traditionally based on multivariable logistic regression (LR) which is not directly interpretable for non-statistical experts like physicians. Furthermore, LR requires complete datasets and pre-established variables specifications. In this study, we leveraged the internal non-linearity, feature selection and missing values handling mechanisms of machine learning algorithms, along with a post-hoc interpretability strategy, as potential advantages over LR for clinical modeling. The dataset included 1154 patients with 2377 individual scans and was obtained from a prospective observational cohort study conducted at a hospital in London, UK, from March 2014 to May 2019. The data were split into a training (70%) and a test set (30%). Parsimonious and complete multivariable models were developed from two algorithms to predict first trimester viability at 11–14 weeks gestational age (GA): LR and light gradient boosted machine (LGBM). Missing values were handled by multiple imputation where appropriate. The SHapley Additive exPlanations (SHAP) framework was applied to derive individual explanations of the models. The parsimonious LGBM model had similar discriminative and calibration performance as the parsimonious LR (AUC 0.885 vs 0.860; calibration slope: 1.19 vs 1.18). The complete models did not outperform the parsimonious models. LGBM was robust to the presence of missing values and did not require multiple imputation unlike LR. Decision path plots and feature importance analysis revealed different algorithm behaviors despite similar predictive performance. The main driving variable from the LR model was the pre-specified interaction between fetal heart presence and mean sac diameter. The crown-rump length variable and a proxy variable reflecting the difference in GA between expected and observed GA were the two most important variables of LGBM. Finally, while variable interactions must be specified upfront with LR, several interactions were ranked by the SHAP framework among the most important features learned automatically by the LGBM algorithm. Gradient boosted algorithms performed similarly to caref
ISSN:0169-2607
1872-7565
DOI:10.1016/j.cmpb.2021.106520