Font Size: a A A

Protein Complex Detection Based On Data Integration And Supervised Learning Method

Posted on:2015-04-30Degree:MasterType:Thesis
Country:ChinaCandidate:F Y YuFull Text:PDF
GTID:2180330467980402Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the genome project completed, mankind enters the post-genome era and comes to realize the importance of protein molecules in life process. Some previous studies show that most proteins form complexes to accomplish their biological functions. Protein complexes are important for understanding principles of cellular organization and function. High-throughput experimental techniques have produced a large amount of protein interactions, which makes it possible to predict protein complexes from protein-protein interaction (PPI) networks. Protein complex becomes a hot research subject of systems biology since it is important for life.First of all, research background, research status, related knowledge and evaluation metrics are introduced in this paper. There are two problems that deteriorate the performance of complex detection algorithms. Protein interactions produced by high-throughput experiments are often associated with high false positive and false negative rates. The popular protein complex detection methods usually utilize pre-defined rules to find the dense regions as the protein complexes. However, not all dense regions are protein complexes. These two problems deteriorate the performance of complex detection methods.In this paper, we present an approach of integrating PPI datasets with the PPI data from biomedical literature for protein complex detection, which can solve the false negative problem of PPI datasets. What is more, since these data from biomedical literature are contributed by biologists and, therefore, relatively accurate, the integration of them into the existing PPI datasets can be hopeful for better complex detection performance. The approach applies PPIExtractor to extract PPI data from biomedical literature which are then integrated into the PPI datasets for protein complex detection.Finally, we present an approach of integrating PPI networks with new PPI data from biomedical literature and supervised learning method for protein complex detection method, which can solve the false positive and false negative problems of PPI networks and solve the limitation problem of protein complex detection methods. In the first step, new PPI data from biomedical literature are integrated into PPI networks, which can solve the false negative problem and the sparsity problem of PPI networks. In the second step, the biological characteristics and the topological characteristics are applied as the strategy to measure the reliability of protein-protein interactions. The low reliability interactions are filtered from the PPI networks by this step, which can solve the false positive problem of PPI networks. In the third step, supervised learning method SLPC is used as the protein complex detection method, which can use the information of available known complexes. The predicted protein complexes are generated by detecting maximal cliques, growing cliques and filtering cliques. This step can solve the limitation problem of protein complex detection methods.In conclusion, our strategy of integrating PPI datasets with the PPI data from biomedical literature, makes various protein complex detection methods get significant improvements in different PPI datasets. Our protein complex detection method based on integrating PPI networks with new PPI data from biomedical literature and supervised learning method, gets better performances than ClusterONE in different PPI datasets.
Keywords/Search Tags:Biomedical Literature, Protein-Protein Interaction, Protein InteractionNetwork, Protein Complex, Supervised Learning
PDF Full Text Request
Related items