Font Size: a A A

Research On Protein Phosphorylation Site Prediction Algorithms Using Capsule Network And Unsupervised Learning

Posted on:2024-03-14Degree:MasterType:Thesis
Country:ChinaCandidate:S X WangFull Text:PDF
GTID:2530306908982979Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Post-translational modifications(PTMs)are the foundation for proteins to perform biological functions.As one of the important and widely studied PTMs,protein phosphorylation plays a critical role in many cellular processes,such as DNA damage repair,transcriptional regulation,signal transduction,and apoptosis.Moreover,abnormal phosphorylation modification can cause protein functional disorder,which is closely related to major diseases such as cancer.A variety of the kinases and phosphatases involved in the phosphorylation process have become attractive targets for drug development.Accurate identification of phosphorylation sites contributes to revealing the diversity and activity status of proteins at a deeper level,providing an essential basis for comprehensively elucidating the impact of phosphorylation on protein functions and regulatory mechanisms,and promoting the annotation of genome sequence data and the development of new drugs.Experimental methods for identifying protein phosphorylation sites have been widely researched over the past few decades,but due to cost constraints,it is difficult to carry out at the whole proteome level.Computational biology and machine learning have become effective tools for further understanding cellular processes,and are continuously improving the efficiency and reliability of phosphorylation site prediction.However,due to the complexity of the molecular processes involved and the limitations of the data required for training models,predicting phosphorylation sites is still a challenging task.Therefore,this study presents a novel framework for the identification of protein phosphorylation sites based on capsule networks and unsupervised learning,with the main work as follows:(1)To address the problem of narrow application and low accuracy of protein phosphorylation site identification tools for prokaryotes,this study proposes a prokaryotic protein phosphorylation site prediction model based on self-attention capsule network,called EcapsP.To solve the issues of poor scalability and a large number of parameters in the dynamic routing process of the capsule network,we design a routing algorithm that captures global information of the input sequence inspired by the self-attention mechanism.This is a more reliable consistency learning method that reduces computational complexity while enriching the network’s representation capability.Aiming at the weak robustness of the model,EcapsP improves the prediction accuracy and stability by introducing shortcuts and unconditional reconfiguration.In independent testing,EcapsP exhibits superior performance compared to other deep learning tools,particularly in terms of Matthews correlation coefficient,with an improvement of at least 7%compared to other tools.Furthermore,EcapsP is the first computational tool to provide prokaryotic phosphorylation site prediction of tyrosine residues.(2)To enhance the prediction performance,aiming at the problem of imbalanced positive and negative samples in protein phosphorylation site prediction,this study proposes a protein sequence enhancement module called PSGAN,which is based on a generative adversarial model and incorporates the Wasserstein distance and Proximal Policy Optimization Algorithms(PPO).The method replaces Jensen-Shannon divergence with Wasserstein distance as the action reward feedback for the generator and integrates the proximal policy optimization to design the objective function in the adversarial model.Furthermore,considering the self-attention mechanism’s excellent modeling capability for long-range dependency relationships in sequences,Generative Pre-training Transformer(GPT)model is adopted as the generator architecture,combined with the generative adversarial network.Finally,transfer learning is employed to address the limitations of adversarial generative models in few-shot learning.According to the experimental results,PSGAN significantly outperforms multiple imbalanced data processing strategies and extracts the intrinsic patterns of kinase-specific sequences through adversarial transfer learning.In comparison to existing phosphorylation site prediction tools,the EcapsP model built on PSGAN shows significant performance improvement.(3)A multi-label kinase predictor(SMPcaps)based on contrastive learning was developed to address the issue of the lack of kinase information for phosphorylation substrates.Unlike machine learning methods that perform single-label classification,SMPcaps defines the kinase-motif correspondence as a multi-label problem and performs prediction tasks through a motif classification model,rather than training models separately for each known kinase family.Secondly,based on the Siamese network,a sequence embedding representation that helps distinguish phosphorylation substrates is constructed by introducing a contrastive loss function that incorporates spatial angle information.Subsequently,to prevent the loss of sequence position information during the embedding process,the physicochemical information encoding was combined with Siamese embedding and fed into a multi-label classification model.Finally,the kinase system evolutionary loss function is incorporated into the prediction model to establish connections among kinase families.Experimental results demonstrate that both Siamese embedding and kinase system evolutionary loss functions effectively improve prediction accuracy.Compared to other specific phosphorylation prediction tools,SMPcaps shows significant performance improvement.
Keywords/Search Tags:protein phosphorylation, capsule network, prokaryote, generative adversarial network, unsupervised learning
PDF Full Text Request
Related items