Loading…

A framework for automated gene selection in genomic applications

An efficient framework to identify disease-associated genes is needed to evaluate genomic data for both individuals with an unknown disease etiology and those undergoing genomic screening. Here, we propose a framework for gene selection used in genomic analyses, including applications limited to gen...

Full description

Saved in:
Bibliographic Details
Published in:Genetics in medicine 2021-10, Vol.23 (10), p.1993-1997
Main Authors: Lazo de la Vega, L., Yu, W., Machini, K., Austin-Tse, C.A., Hao, L., Blout Zawatsky, C.L., Mason-Suares, H., Green, R.C., Rehm, H.L., Lebo, M.S.
Format: Article
Language:English
Subjects:
Citations: Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:An efficient framework to identify disease-associated genes is needed to evaluate genomic data for both individuals with an unknown disease etiology and those undergoing genomic screening. Here, we propose a framework for gene selection used in genomic analyses, including applications limited to genes with strong or established evidence levels and applications including genes with less or emerging evidence of disease association. We extracted genes with evidence for gene–disease association from the Human Gene Mutation Database, OMIM, and ClinVar to build a comprehensive gene list of 6,145 genes. Next, we applied stringent filters in conjunction with computationally curated evidence (DisGeNET) to create a restrictive list limited to 3,929 genes with stronger disease associations. When compared to manual gene curation efforts, including the Clinical Genome Resource, genes with strong or definitive disease associations are included in both gene lists at high percentages, while genes with limited evidence are largely removed. We further confirmed the utility of this approach in identifying pathogenic and likely pathogenic variants in 45 genomes. Our approach efficiently creates highly sensitive gene lists for genomic applications, while remaining dynamic and updatable, enabling time savings in genomic applications.
ISSN:1098-3600
1530-0366
DOI:10.1038/s41436-021-01213-x