Font Size: a A A

Methods On Constructing Protein-Protein Interaction Network By Merging Multiple Data Sources

Posted on:2010-12-30Degree:MasterType:Thesis
Country:ChinaCandidate:X F YangFull Text:PDF
GTID:2120360302959715Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Protein-protein interactions (PPI) play an important role in almost every cellular process. Mapping a complete set of protein interactions of an organism has profound significance for understanding fundamental principle underlying biological system. Traditional experimental methods can only detect a single interaction once, which is time-consuming and laborious. Although these methods have accumulated a considerable amount of data, they are far from complete. Recent advances in high-throughput experimental technologies have generated an enormous amount of data and provide valuable resources for studying protein interactions. However these technologies suffer from high error rates due to their inherent limitations. Thus computational approaches become important complementary way to acquire protein interactions, hence become hot research topic.This thesis combines several types of data from different sources, utilizing machine learning techniques to process these data to obtain the protein-protein interaction network (PPIN) of model organism-Saccharomyces cerevisiae. The main work of this thesis consists of the following parts: Collecting data from internet and preprocess them into feature values to be used in the following prediction task, and store them in the database; Using different machine learning methods to process these data and predict protein interaction network, we validate our PPIN through searching established signaling pathways; Propose a new classifier based on communication theory for the prediction of PPI, we compare the classifier with na?ve Bayesian method and validate the results against experimental data; Integrating topological property with genomic metrics to rectify PPIN and obtain more reliable interactions.
Keywords/Search Tags:Protein-protein interaction, Multiple data sources, Logistic regression, Information theory, Topology
PDF Full Text Request
Related items