Font Size: a A A

Protein Complex Detection In Human PPI Networks Based On Supervised Learning Method

Posted on:2018-08-29Degree:MasterType:Thesis
Country:ChinaCandidate:Z W ZhouFull Text:PDF
GTID:2310330536960947Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Protein complexes are essential cellular functional units in which several proteins with similar functions work as parts of assemblies and play a role in organisms in the form of complexes.These protein complexes are vital to comprehend the mechanisms of cellular tissues and functions.With the development of human genomics research and high-throughput technology,a large amount of protein-protein interactions(PPIs)data are generated,resuting in the emergence of a variety of protein interaction networks,such as yeast protein interaction networks,human protein interaction networks,pathogens protein interaction networks,etc.Those interaction networks lay a good data foundation for the research of complex identification.At the same time,the unreliable relationships in multiple data also poses a great challenge to complex identification.First,the research background and significance of protein complex recognition algorithm are introduced,as well as the research status.In addition,several problems existing in the identification of protein complexes are summarized.How to identify effective complexes on human protein networks and reveal the relationship between protein complexes and disease.How to select the appropriate complex detection algorithm for different protein interaction networks to achieve higher performance.How to integrate more features into complex detection tasks to enhance the performance of the algorithm.The existence of these problems limits the development of complex identification algorithms.Then,in order to identify the complex on the human protein network effectively,we present an improved complex detection algorithm based on supervised learning.The improved algorithm makes full use of the topology of the network and fuses the biological characteristics based on the gene ontology to enhance the performance of complex recognition algorithm.Besides,in order to better reveal the relationship between protein complexes and diseases,the specific disease related protein interactions extracted from biological literature via protein interaction extraction system are integrated into the original network to improve the effect of disease complex identification.Through the analysis of the specific disease complexes detected with our method,more biological insights for the disease are provided.Furthermore,in order to select the appropriate complex recognition algorithm for different protein networks,based on the study of the human protein network and the yeast network,the adaptability of the existing complex algorithms on different networks is explored.What's more,the regression model of the existing algorithms with the random forest model are compared to reveal the effect of different features on different networks,providing a beneficial reference to the study of complex identification algorithm.Finally,the existing complex recognition algorithms are all based on artificial characteristics.In order to measure the effect of auto-learning features in complex recognition tasks,our method combines the learning methods based on node vectors into existing complex identification algorithms.The fusion of features makes the existing complex recognition algorithm more effective.
Keywords/Search Tags:Protein Interaction, Protein Interaction Network, Supervised Learning, Disease-Specific Complex, Node Vector
PDF Full Text Request
Related items