Loading…

Polygenic risk prediction: why and when out-of-sample prediction R 2 can exceed SNP-based heritability

In polygenic score (PGS) analysis, the coefficient of determination (R ) is a key statistic to evaluate efficacy. R is the proportion of phenotypic variance explained by the PGS, calculated in a cohort that is independent of the genome-wide association study (GWAS) that provided estimates of allelic...

Full description

Saved in:
Bibliographic Details
Published in:American journal of human genetics 2023-07, Vol.110 (7), p.1207
Main Authors: Wang, Xiaotong, Walker, Alicia, Revez, Joana A, Ni, Guiyan, Adams, Mark J, McIntosh, Andrew M, Visscher, Peter M, Wray, Naomi R
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In polygenic score (PGS) analysis, the coefficient of determination (R ) is a key statistic to evaluate efficacy. R is the proportion of phenotypic variance explained by the PGS, calculated in a cohort that is independent of the genome-wide association study (GWAS) that provided estimates of allelic effect sizes. The SNP-based heritability (h , the proportion of total phenotypic variances attributable to all common SNPs) is the theoretical upper limit of the out-of-sample prediction R . However, in real data analyses R has been reported to exceed h , which occurs in parallel with the observation that h estimates tend to decline as the number of cohorts being meta-analyzed increases. Here, we quantify why and when these observations are expected. Using theory and simulation, we show that if heterogeneities in cohort-specific h exist, or if genetic correlations between cohorts are less than one, h estimates can decrease as the number of cohorts being meta-analyzed increases. We derive conditions when the out-of-sample prediction R will be greater than h and show the validity of our derivations with real data from a binary trait (major depression) and a continuous trait (educational attainment). Our research calls for a better approach to integrating information from multiple cohorts to address issues of between-cohort heterogeneity.
ISSN:1537-6605