Prediction of protein functional residues from sequence by probability density estimation

Motivation: The prediction of ligand-binding residues or catalytically active residues of a protein may give important hints that can guide further genetic or biochemical studies. Existing sequence-based prediction methods mostly rank residue positions by evolutionary conservation calculated from a...

Full description

Saved in:

Bibliographic Details
Published in:	Bioinformatics 2008-03, Vol.24 (5), p.613-620
Main Authors:	Fischer, J. D., Mayer, C. E., Söding, J.
Format:	Article
Language:	eng
Subjects:	Biological and medical sciences Catalytic Domain Fundamental and applied biological sciences. Psychology General aspects Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects) Probability Proteins - chemistry Proteins - physiology
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Motivation: The prediction of ligand-binding residues or catalytically active residues of a protein may give important hints that can guide further genetic or biochemical studies. Existing sequence-based prediction methods mostly rank residue positions by evolutionary conservation calculated from a multiple sequence alignment of homologs. A problem hampering more wide-spread application of these methods is the low per-residue precision, which at 20% sensitivity is around 35% for ligand-binding residues and 20% for catalytic residues. Results: We combine information from the conservation at each site, its amino acid distribution, as well as its predicted secondary structure (ss) and relative solvent accessibility (rsa). First, we measure conservation by how much the amino acid distribution at each site differs from the distribution expected for the predicted ss and rsa states. Second, we include the conservation of neighboring residues in a weighted linear score by analytically optimizing the signal-to-noise ratio of the total score. Third, we use conditional probability density estimation to calculate the probability of each site to be functional given its conservation, the observed amino acid distribution, and the predicted ss and rsa states. We have constructed two large data sets, one based on the Catalytic Site Atlas and the other on PDB SITE records, to benchmark methods for predicting functional residues. The new method FRcons predicts ligand-binding and catalytic residues with higher precision than alternative methods over the entire sensitivity range, reaching 50% and 40% precision at 20% sensitivity, respectively. Availability: Server: http://frpred.tuebingen.mpg.de. Data sets: ftp://ftp.tuebingen.mpg.de/pub/protevo/FRpred/ Contact: soeding@lmb.uni-muenchen.de Supplementary information: Supplementary data are available at Bioinformatics Online.
ISSN:	1367-4803 1460-2059 1367-4811