Loading…

DBPboost:A method of classification of DNA-binding proteins based on improved differential evolution algorithm and feature extraction

•Combining a novel protein feature extraction approach with PSSM matrices and commonly used feature extraction methods.•In terms of feature fusion, the differential evolution algorithm has been optimized.•Innovatively, feature selection is applied twice, once during feature extraction and then again...

Full description

Saved in:
Bibliographic Details
Published in:Methods (San Diego, Calif.) Calif.), 2024-03, Vol.223, p.56-64
Main Authors: Sun, Ailun, Li, Hongfei, Dong, Guanghui, Zhao, Yuming, Zhang, Dandan
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•Combining a novel protein feature extraction approach with PSSM matrices and commonly used feature extraction methods.•In terms of feature fusion, the differential evolution algorithm has been optimized.•Innovatively, feature selection is applied twice, once during feature extraction and then again after feature fusion.•Optimized the XGBoost algorithm to enhance classification efficiency. DNA-binding proteins are a class of proteins that can interact with DNA molecules through physical and chemical interactions. Their main functions include regulating gene expression, maintaining chromosome structure and stability, and more. DNA-binding proteins play a crucial role in cellular and molecular biology, as they are essential for maintaining normal cellular physiological functions and adapting to environmental changes. The prediction of DNA-binding proteins has been a hot topic in the field of bioinformatics. The key to accurately classifying DNA-binding proteins is to find suitable feature sources and explore the information they contain. Although there are already many models for predicting DNA-binding proteins, there is still room for improvement in mining feature source information and calculation methods. In this study, we created a model called DBPboost to better identify DNA-binding proteins. The innovation of this study lies in the use of eight feature extraction methods, the improvement of the feature selection step, which involves selecting some features first and then performing feature selection again after feature fusion, and the optimization of the differential evolution algorithm in feature fusion, which improves the performance of feature fusion. The experimental results show that the prediction accuracy of the model on the UniSwiss dataset is 89.32%, and the sensitivity is 89.01%, which is better than most existing models.
ISSN:1046-2023
1095-9130
DOI:10.1016/j.ymeth.2024.01.005