Loading…

Improving quantitative prediction of protein subcellular locations in fluorescence images through deep generative models

Machine learning has been employed in recognizing protein localization at the subcellular level, which highly facilitates the protein function studies, especially for those multi-label proteins that localize in more than one organelle. However, existing works mostly study the qualitative classificat...

Full description

Saved in:
Bibliographic Details
Published in:Computers in biology and medicine 2024-09, Vol.179, p.108913, Article 108913
Main Authors: Li, Yu, Zeng, Guo-Hua, Liang, Yong-Jia, Yang, Hong-Rui, Zhu, Xi-Liang, Zhai, Yu-Jia, Duan, Li-Xia, Xu, Ying-Ying
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Machine learning has been employed in recognizing protein localization at the subcellular level, which highly facilitates the protein function studies, especially for those multi-label proteins that localize in more than one organelle. However, existing works mostly study the qualitative classification of protein subcellular locations, ignoring fraction of one multi-label protein in different locations. Actually, about 50 % proteins are multi-label proteins, and the ignorance of quantitative information highly restricts the understanding of their spatial distribution and functional mechanism. One reason of the lack of quantitative study is the insufficiency of quantitative annotations. To address the data shortage problem, here we proposed a generative model, PLocGAN, which could generate cell images with conditional quantitative annotation of the fluorescence distribution. The model was a conditional generative adversarial network, in which the condition learning utilized partial label learning to overcome the lack of training labels and allowed training with only qualitative labels. Meanwhile, it used contrastive learning to enhance diversity of the generated images. We assessed the PLocGAN on four pixel-fused synthetic datasets and one real dataset, and demonstrated that the model could generate images with good fidelity and diversity, outperforming existing state-of-the-art generative methods. To verify the utility of PLocGAN in the quantitative prediction of protein subcellular locations, we replaced the training images with generated quantitative images and built prediction models, and found that they had a boosting effect on the quantitative estimation. This work demonstrates the effectiveness of deep generative models in bioimage analysis, and provides a new solution for quantitative subcellular proteomics. [Display omitted] •PLocGAN can generate cell images with given quantitative fluorescence conditions.•Training of the PLocGAN requires only qualitatively labelled images.•Demonstrate better performance than state-of-the-art on synthetic and real datasets.•The generated images enhance the quantitative protein subcellular location prediction.
ISSN:0010-4825
1879-0534
1879-0534
DOI:10.1016/j.compbiomed.2024.108913