Font Size: a A A

Protein Complex Identification Based On Presentation Learning And Contrastive Learning

Posted on:2024-04-22Degree:MasterType:Thesis
Country:ChinaCandidate:P X ZhouFull Text:PDF
GTID:2530307295951389Subject:Engineering
Abstract/Summary:PDF Full Text Request
Proteins are the fundamental chemical building blocks of living organisms,and they play critical roles in almost all biological processes.Studies have shown that proteins do not exist in isolation within cells,but often form interacting groups,known as complexes,that participate in various life activities.Therefore,the recognition and discovery of protein complexes are essential for in-depth research in life sciences.With the development of high-throughput technologies,protein interaction data is constantly increasing,forming large-scale protein interaction networks.Using computer technology to identify complexes based on protein interaction networks can significantly improve the efficiency of identification,and is currently one of the research hotspots in the field of bioinformatics.To deeply explore the characteristics of protein networks,fully utilize additional biological resources,and reduce the impact of network noise on complex identification,this article will conduct research from three aspects,as detailed below:(1)To address the issue of insufficient utilization and integration of protein interaction networks and biological attribute information in existing methods,this thesis proposes a complex identification method called GHAE based on heterogeneous protein information networks.Firstly,a heterogeneous protein information network is constructed that integrates GO attribute information and network topology features,and protein feature representations that combine attribute information are explored based on heterogeneous graph representation learning methods.Then,protein interaction strengths in the protein interaction network are quantified based on protein embedding representations,and protein complexes are identified based on weighted networks.Experimental results show that GHAE can fully integrate valuable biological information,and the use of heterogeneous graph representation learning methods can explore and integrate protein and gene ontology attributes to improve the performance of complex identification.(2)In response to the insufficient exploration of the local features of complex protein networks and the global information of the network itself by existing methods,as well as the typically ignored higher-order interactions between proteins,this thesis proposes a complex identification method called CSPI based on contrastive graph representation learning.Firstly,a mixed hop neighborhood aggregator is utilized to capture the high-order correlations among proteins in the network.Meanwhile,based on the structural information of the protein network itself,a contrastive learning joint task is designed for self-supervised training,so that the learned protein node representations can fully integrate the local features and global information of the protein network.Experimental results demonstrate that CSPI can fully explore the topological structure information of the complex protein network itself,achieving higher F-values compared to existing methods,and is suitable for large-scale protein networks.(3)In order to fully utilize the topological features of known protein complexes to improve the efficiency and accuracy of complex identification,this thesis proposes a topological feature fusion method for complex identification,called CSNE-RF.Firstly,the topological features of known complexes are mined based on the contrastive graph representation learning method.Then,these topological features are fused and SLPC and random forest classifiers are trained to screen protein complexes.Experimental results show that CSNE-RF can effectively mine the topological features of standard complexes.Compared with existing methods,the topological feature fusion method for complex identification can improve the identification ability and identify complexes with biological significance.The main work of this article is to identify protein complexes using representation learning and contrastive learning methods.In scenarios where different biological resources are used,three protein complex identification methods are proposed to address the existing problems in current methods,further improving the performance of protein complex identification and identifying biologically significant complexes.Additionally,these methods expand the research ideas of researchers and promote the development of the field of complex identification.
Keywords/Search Tags:Protein Complex Identification, Graph Neural Network, Representation Learning, Contrast Learning
PDF Full Text Request
Related items