Loading…

Towards global model generalizability: independent cross-site feature evaluation for patient-level risk prediction models using the OHDSI network

Predictive models show promise in healthcare, but their successful deployment is challenging due to limited generalizability. Current external validation often focuses on model performance with restricted feature use from the original training data, lacking insights into their suitability at externa...

Full description

Saved in:

Bibliographic Details
Published in:	Journal of the American Medical Informatics Association : JAMIA 2024-04, Vol.31 (5), p.1051-1061
Main Authors:	Naderalvojoud, Behzad, Curtin, Catherine M, Yanover, Chen, El-Hay, Tal, Choi, Byungjin, Park, Rae Woong, Tabuenca, Javier Gracia, Reeve, Mary Pat, Falconer, Thomas, Humphreys, Keith, Asch, Steven M, Hernandez-Boussard, Tina
Format:	Article
Language:	English
Subjects:	Data Science Finland Humans Logistic Models Medical Informatics United Kingdom
Citations:	Items that this one cites
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites	cdi_FETCH-LOGICAL-c288t-877133fd7de071bc75647541f262dc854cfd19b1f6dda9cc409ce281b6f57d83
container_end_page	1061
container_issue	5
container_start_page	1051
container_title	Journal of the American Medical Informatics Association : JAMIA
container_volume	31
creator	Naderalvojoud, Behzad Curtin, Catherine M Yanover, Chen El-Hay, Tal Choi, Byungjin Park, Rae Woong Tabuenca, Javier Gracia Reeve, Mary Pat Falconer, Thomas Humphreys, Keith Asch, Steven M Hernandez-Boussard, Tina
description	Predictive models show promise in healthcare, but their successful deployment is challenging due to limited generalizability. Current external validation often focuses on model performance with restricted feature use from the original training data, lacking insights into their suitability at external sites. Our study introduces an innovative methodology for evaluating features during both the development phase and the validation, focusing on creating and validating predictive models for post-surgery patient outcomes with improved generalizability. Electronic health records (EHRs) from 4 countries (United States, United Kingdom, Finland, and Korea) were mapped to the OMOP Common Data Model (CDM), 2008-2019. Machine learning (ML) models were developed to predict post-surgery prolonged opioid use (POU) risks using data collected 6 months before surgery. Both local and cross-site feature selection methods were applied in the development and external validation datasets. Models were developed using Observational Health Data Sciences and Informatics (OHDSI) tools and validated on separate patient cohorts. Model development included 41 929 patients, 14.6% with POU. The external validation included 31 932 (UK), 23 100 (US), 7295 (Korea), and 3934 (Finland) patients with POU of 44.2%, 22.0%, 15.8%, and 21.8%, respectively. The top-performing model, Lasso logistic regression, achieved an area under the receiver operating characteristic curve (AUROC) of 0.75 during local validation and 0.69 (SD = 0.02) (averaged) in external validation. Models trained with cross-site feature selection significantly outperformed those using only features from the development site through external validation (P
doi_str_mv	10.1093/jamia/ocae028
format	article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_2932937368</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2932937368</sourcerecordid><originalsourceid>FETCH-LOGICAL-c288t-877133fd7de071bc75647541f262dc854cfd19b1f6dda9cc409ce281b6f57d83</originalsourceid><addsrcrecordid>eNo9kTtOxTAQRS0E4l_SIpc0AX-SOKFD_CUkCl5BFzn2-GFw4mA7IFgGK2AtrIzweCCNZm5xdEczF6E9Sg4pqfnRo-ysPPJKAmHVCtqkBRNZLfL71UmTUmQFYWIDbcX4SAgtGS_W0Qavcso4p5voY-ZfZdARz51vpcOd1-DwHHoI0tl32Vpn09sxtr2GAabWJ6yCjzGLNgE2INMYAMOLdKNM1vfY-ICHSU5k5uBlcgs2PuEhgLZqQSx2RDxG289xeoCvz9urs7tr3EN69eFpB60Z6SLsLuc2ml2cz06vspvby-vTk5tMsapKWSUE5dxooYEI2ipRlLkocmpYybSqilwZTeuWmlJrWSuVk1oBq2hbmkLoim-jg1_bIfjnEWJqOhsVOCd78GNsWM2nErz8QbNfdHF5ANMMwXYyvDWUND8pNIsUmmUKE7-_tB7bDvQ__fd2_g1Z_4nb</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2932937368</pqid></control><display><type>article</type><title>Towards global model generalizability: independent cross-site feature evaluation for patient-level risk prediction models using the OHDSI network</title><source>Oxford Journals - Connect here FIRST to enable access</source><creator>Naderalvojoud, Behzad ; Curtin, Catherine M ; Yanover, Chen ; El-Hay, Tal ; Choi, Byungjin ; Park, Rae Woong ; Tabuenca, Javier Gracia ; Reeve, Mary Pat ; Falconer, Thomas ; Humphreys, Keith ; Asch, Steven M ; Hernandez-Boussard, Tina</creator><creatorcontrib>Naderalvojoud, Behzad ; Curtin, Catherine M ; Yanover, Chen ; El-Hay, Tal ; Choi, Byungjin ; Park, Rae Woong ; Tabuenca, Javier Gracia ; Reeve, Mary Pat ; Falconer, Thomas ; Humphreys, Keith ; Asch, Steven M ; Hernandez-Boussard, Tina</creatorcontrib><description>Predictive models show promise in healthcare, but their successful deployment is challenging due to limited generalizability. Current external validation often focuses on model performance with restricted feature use from the original training data, lacking insights into their suitability at external sites. Our study introduces an innovative methodology for evaluating features during both the development phase and the validation, focusing on creating and validating predictive models for post-surgery patient outcomes with improved generalizability. Electronic health records (EHRs) from 4 countries (United States, United Kingdom, Finland, and Korea) were mapped to the OMOP Common Data Model (CDM), 2008-2019. Machine learning (ML) models were developed to predict post-surgery prolonged opioid use (POU) risks using data collected 6 months before surgery. Both local and cross-site feature selection methods were applied in the development and external validation datasets. Models were developed using Observational Health Data Sciences and Informatics (OHDSI) tools and validated on separate patient cohorts. Model development included 41 929 patients, 14.6% with POU. The external validation included 31 932 (UK), 23 100 (US), 7295 (Korea), and 3934 (Finland) patients with POU of 44.2%, 22.0%, 15.8%, and 21.8%, respectively. The top-performing model, Lasso logistic regression, achieved an area under the receiver operating characteristic curve (AUROC) of 0.75 during local validation and 0.69 (SD = 0.02) (averaged) in external validation. Models trained with cross-site feature selection significantly outperformed those using only features from the development site through external validation (P < .05). Using EHRs across four countries mapped to the OMOP CDM, we developed generalizable predictive models for POU. Our approach demonstrates the significant impact of cross-site feature selection in improving model performance, underscoring the importance of incorporating diverse feature sets from various clinical settings to enhance the generalizability and utility of predictive healthcare models.</description><identifier>ISSN: 1067-5027</identifier><identifier>EISSN: 1527-974X</identifier><identifier>DOI: 10.1093/jamia/ocae028</identifier><identifier>PMID: 38412331</identifier><language>eng</language><publisher>England</publisher><subject>Data Science ; Finland ; Humans ; Logistic Models ; Medical Informatics ; United Kingdom</subject><ispartof>Journal of the American Medical Informatics Association : JAMIA, 2024-04, Vol.31 (5), p.1051-1061</ispartof><rights>The Author(s) 2024. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For permissions, please email: journals.permissions@oup.com.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c288t-877133fd7de071bc75647541f262dc854cfd19b1f6dda9cc409ce281b6f57d83</cites><orcidid>0000-0003-4429-5341 ; 0000-0001-6553-3455</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>315,786,790,27957,27958</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/38412331$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Naderalvojoud, Behzad</creatorcontrib><creatorcontrib>Curtin, Catherine M</creatorcontrib><creatorcontrib>Yanover, Chen</creatorcontrib><creatorcontrib>El-Hay, Tal</creatorcontrib><creatorcontrib>Choi, Byungjin</creatorcontrib><creatorcontrib>Park, Rae Woong</creatorcontrib><creatorcontrib>Tabuenca, Javier Gracia</creatorcontrib><creatorcontrib>Reeve, Mary Pat</creatorcontrib><creatorcontrib>Falconer, Thomas</creatorcontrib><creatorcontrib>Humphreys, Keith</creatorcontrib><creatorcontrib>Asch, Steven M</creatorcontrib><creatorcontrib>Hernandez-Boussard, Tina</creatorcontrib><title>Towards global model generalizability: independent cross-site feature evaluation for patient-level risk prediction models using the OHDSI network</title><title>Journal of the American Medical Informatics Association : JAMIA</title><addtitle>J Am Med Inform Assoc</addtitle><description>Predictive models show promise in healthcare, but their successful deployment is challenging due to limited generalizability. Current external validation often focuses on model performance with restricted feature use from the original training data, lacking insights into their suitability at external sites. Our study introduces an innovative methodology for evaluating features during both the development phase and the validation, focusing on creating and validating predictive models for post-surgery patient outcomes with improved generalizability. Electronic health records (EHRs) from 4 countries (United States, United Kingdom, Finland, and Korea) were mapped to the OMOP Common Data Model (CDM), 2008-2019. Machine learning (ML) models were developed to predict post-surgery prolonged opioid use (POU) risks using data collected 6 months before surgery. Both local and cross-site feature selection methods were applied in the development and external validation datasets. Models were developed using Observational Health Data Sciences and Informatics (OHDSI) tools and validated on separate patient cohorts. Model development included 41 929 patients, 14.6% with POU. The external validation included 31 932 (UK), 23 100 (US), 7295 (Korea), and 3934 (Finland) patients with POU of 44.2%, 22.0%, 15.8%, and 21.8%, respectively. The top-performing model, Lasso logistic regression, achieved an area under the receiver operating characteristic curve (AUROC) of 0.75 during local validation and 0.69 (SD = 0.02) (averaged) in external validation. Models trained with cross-site feature selection significantly outperformed those using only features from the development site through external validation (P < .05). Using EHRs across four countries mapped to the OMOP CDM, we developed generalizable predictive models for POU. Our approach demonstrates the significant impact of cross-site feature selection in improving model performance, underscoring the importance of incorporating diverse feature sets from various clinical settings to enhance the generalizability and utility of predictive healthcare models.</description><subject>Data Science</subject><subject>Finland</subject><subject>Humans</subject><subject>Logistic Models</subject><subject>Medical Informatics</subject><subject>United Kingdom</subject><issn>1067-5027</issn><issn>1527-974X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNo9kTtOxTAQRS0E4l_SIpc0AX-SOKFD_CUkCl5BFzn2-GFw4mA7IFgGK2AtrIzweCCNZm5xdEczF6E9Sg4pqfnRo-ysPPJKAmHVCtqkBRNZLfL71UmTUmQFYWIDbcX4SAgtGS_W0Qavcso4p5voY-ZfZdARz51vpcOd1-DwHHoI0tl32Vpn09sxtr2GAabWJ6yCjzGLNgE2INMYAMOLdKNM1vfY-ICHSU5k5uBlcgs2PuEhgLZqQSx2RDxG289xeoCvz9urs7tr3EN69eFpB60Z6SLsLuc2ml2cz06vspvby-vTk5tMsapKWSUE5dxooYEI2ipRlLkocmpYybSqilwZTeuWmlJrWSuVk1oBq2hbmkLoim-jg1_bIfjnEWJqOhsVOCd78GNsWM2nErz8QbNfdHF5ANMMwXYyvDWUND8pNIsUmmUKE7-_tB7bDvQ__fd2_g1Z_4nb</recordid><startdate>20240419</startdate><enddate>20240419</enddate><creator>Naderalvojoud, Behzad</creator><creator>Curtin, Catherine M</creator><creator>Yanover, Chen</creator><creator>El-Hay, Tal</creator><creator>Choi, Byungjin</creator><creator>Park, Rae Woong</creator><creator>Tabuenca, Javier Gracia</creator><creator>Reeve, Mary Pat</creator><creator>Falconer, Thomas</creator><creator>Humphreys, Keith</creator><creator>Asch, Steven M</creator><creator>Hernandez-Boussard, Tina</creator><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0003-4429-5341</orcidid><orcidid>https://orcid.org/0000-0001-6553-3455</orcidid></search><sort><creationdate>20240419</creationdate><title>Towards global model generalizability: independent cross-site feature evaluation for patient-level risk prediction models using the OHDSI network</title><author>Naderalvojoud, Behzad ; Curtin, Catherine M ; Yanover, Chen ; El-Hay, Tal ; Choi, Byungjin ; Park, Rae Woong ; Tabuenca, Javier Gracia ; Reeve, Mary Pat ; Falconer, Thomas ; Humphreys, Keith ; Asch, Steven M ; Hernandez-Boussard, Tina</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c288t-877133fd7de071bc75647541f262dc854cfd19b1f6dda9cc409ce281b6f57d83</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Data Science</topic><topic>Finland</topic><topic>Humans</topic><topic>Logistic Models</topic><topic>Medical Informatics</topic><topic>United Kingdom</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Naderalvojoud, Behzad</creatorcontrib><creatorcontrib>Curtin, Catherine M</creatorcontrib><creatorcontrib>Yanover, Chen</creatorcontrib><creatorcontrib>El-Hay, Tal</creatorcontrib><creatorcontrib>Choi, Byungjin</creatorcontrib><creatorcontrib>Park, Rae Woong</creatorcontrib><creatorcontrib>Tabuenca, Javier Gracia</creatorcontrib><creatorcontrib>Reeve, Mary Pat</creatorcontrib><creatorcontrib>Falconer, Thomas</creatorcontrib><creatorcontrib>Humphreys, Keith</creatorcontrib><creatorcontrib>Asch, Steven M</creatorcontrib><creatorcontrib>Hernandez-Boussard, Tina</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>Journal of the American Medical Informatics Association : JAMIA</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Naderalvojoud, Behzad</au><au>Curtin, Catherine M</au><au>Yanover, Chen</au><au>El-Hay, Tal</au><au>Choi, Byungjin</au><au>Park, Rae Woong</au><au>Tabuenca, Javier Gracia</au><au>Reeve, Mary Pat</au><au>Falconer, Thomas</au><au>Humphreys, Keith</au><au>Asch, Steven M</au><au>Hernandez-Boussard, Tina</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Towards global model generalizability: independent cross-site feature evaluation for patient-level risk prediction models using the OHDSI network</atitle><jtitle>Journal of the American Medical Informatics Association : JAMIA</jtitle><addtitle>J Am Med Inform Assoc</addtitle><date>2024-04-19</date><risdate>2024</risdate><volume>31</volume><issue>5</issue><spage>1051</spage><epage>1061</epage><pages>1051-1061</pages><issn>1067-5027</issn><eissn>1527-974X</eissn><notes>ObjectType-Article-1</notes><notes>SourceType-Scholarly Journals-1</notes><notes>ObjectType-Feature-2</notes><notes>content type line 23</notes><abstract>Predictive models show promise in healthcare, but their successful deployment is challenging due to limited generalizability. Current external validation often focuses on model performance with restricted feature use from the original training data, lacking insights into their suitability at external sites. Our study introduces an innovative methodology for evaluating features during both the development phase and the validation, focusing on creating and validating predictive models for post-surgery patient outcomes with improved generalizability. Electronic health records (EHRs) from 4 countries (United States, United Kingdom, Finland, and Korea) were mapped to the OMOP Common Data Model (CDM), 2008-2019. Machine learning (ML) models were developed to predict post-surgery prolonged opioid use (POU) risks using data collected 6 months before surgery. Both local and cross-site feature selection methods were applied in the development and external validation datasets. Models were developed using Observational Health Data Sciences and Informatics (OHDSI) tools and validated on separate patient cohorts. Model development included 41 929 patients, 14.6% with POU. The external validation included 31 932 (UK), 23 100 (US), 7295 (Korea), and 3934 (Finland) patients with POU of 44.2%, 22.0%, 15.8%, and 21.8%, respectively. The top-performing model, Lasso logistic regression, achieved an area under the receiver operating characteristic curve (AUROC) of 0.75 during local validation and 0.69 (SD = 0.02) (averaged) in external validation. Models trained with cross-site feature selection significantly outperformed those using only features from the development site through external validation (P < .05). Using EHRs across four countries mapped to the OMOP CDM, we developed generalizable predictive models for POU. Our approach demonstrates the significant impact of cross-site feature selection in improving model performance, underscoring the importance of incorporating diverse feature sets from various clinical settings to enhance the generalizability and utility of predictive healthcare models.</abstract><cop>England</cop><pmid>38412331</pmid><doi>10.1093/jamia/ocae028</doi><tpages>11</tpages><orcidid>https://orcid.org/0000-0003-4429-5341</orcidid><orcidid>https://orcid.org/0000-0001-6553-3455</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 1067-5027
ispartof	Journal of the American Medical Informatics Association : JAMIA, 2024-04, Vol.31 (5), p.1051-1061
issn	1067-5027 1527-974X
language	eng
recordid	cdi_proquest_miscellaneous_2932937368
source	Oxford Journals - Connect here FIRST to enable access
subjects	Data Science Finland Humans Logistic Models Medical Informatics United Kingdom
title	Towards global model generalizability: independent cross-site feature evaluation for patient-level risk prediction models using the OHDSI network
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-09-21T13%3A44%3A55IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Towards%20global%20model%20generalizability:%20independent%20cross-site%20feature%20evaluation%20for%20patient-level%20risk%20prediction%20models%20using%20the%C2%A0OHDSI%20network&rft.jtitle=Journal%20of%20the%20American%20Medical%20Informatics%20Association%20:%20JAMIA&rft.au=Naderalvojoud,%20Behzad&rft.date=2024-04-19&rft.volume=31&rft.issue=5&rft.spage=1051&rft.epage=1061&rft.pages=1051-1061&rft.issn=1067-5027&rft.eissn=1527-974X&rft_id=info:doi/10.1093/jamia/ocae028&rft_dat=%3Cproquest_cross%3E2932937368%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c288t-877133fd7de071bc75647541f262dc854cfd19b1f6dda9cc409ce281b6f57d83%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2932937368&rft_id=info:pmid/38412331&rfr_iscdi=true