Loading…

Pandagma: a tool for identifying pan-gene sets and gene families at desired evolutionary depths and accommodating whole-genome duplications

Abstract Summary Identification of allelic or corresponding genes (pan-genes) within a species or genus is important for discovery of biologically significant genetic conservation and variation. Similarly, identification of orthologs (gene families) across wider evolutionary distances is important f...

Full description

Saved in:
Bibliographic Details
Published in:Bioinformatics (Oxford, England) England), 2024-09, Vol.40 (9)
Main Authors: Cannon, Steven B, Lee, Hyun-Oh, Weeks, Nathan T, Berendzen, Joel
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Summary Identification of allelic or corresponding genes (pan-genes) within a species or genus is important for discovery of biologically significant genetic conservation and variation. Similarly, identification of orthologs (gene families) across wider evolutionary distances is important for understanding the genetic basis for similar or differing traits. Especially in plants, several complications make identification of pan-genes and gene families challenging, including whole-genome duplications, evolutionary rate differences among lineages, and varying qualities of assemblies and annotations. Here, we document and distribute a set of workflows that we have used to address these problems. Results Pandagma is a set of configurable workflows for identifying and comparing pan-gene sets and gene families for annotation sets from eukaryotic genomes, using a combination of homology, synteny, and expected rates of synonymous change in coding sequence. Availability and implementation The Pandagma workflows, example configurations, implementation details, and scripts for retrieving public datasets, are available at https://github.com/legumeinfo/pandagma
ISSN:1367-4811
1367-4803
1367-4811
DOI:10.1093/bioinformatics/btae526