Loading…
Pandagma: a tool for identifying pan-gene sets and gene families at desired evolutionary depths and accommodating whole-genome duplications
Abstract Summary Identification of allelic or corresponding genes (pan-genes) within a species or genus is important for discovery of biologically significant genetic conservation and variation. Similarly, identification of orthologs (gene families) across wider evolutionary distances is important f...
Saved in:
Published in: | Bioinformatics (Oxford, England) England), 2024-09, Vol.40 (9) |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Abstract
Summary
Identification of allelic or corresponding genes (pan-genes) within a species or genus is important for discovery of biologically significant genetic conservation and variation. Similarly, identification of orthologs (gene families) across wider evolutionary distances is important for understanding the genetic basis for similar or differing traits. Especially in plants, several complications make identification of pan-genes and gene families challenging, including whole-genome duplications, evolutionary rate differences among lineages, and varying qualities of assemblies and annotations. Here, we document and distribute a set of workflows that we have used to address these problems.
Results
Pandagma is a set of configurable workflows for identifying and comparing pan-gene sets and gene families for annotation sets from eukaryotic genomes, using a combination of homology, synteny, and expected rates of synonymous change in coding sequence.
Availability and implementation
The Pandagma workflows, example configurations, implementation details, and scripts for retrieving public datasets, are available at https://github.com/legumeinfo/pandagma |
---|---|
ISSN: | 1367-4811 1367-4803 1367-4811 |
DOI: | 10.1093/bioinformatics/btae526 |