Font Size: a A A

The Study Of Analysis And Application Of Protein-protein Interaction Data Based On Graph And Complex Networks Theory

Posted on:2011-03-13Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z H YouFull Text:PDF
GTID:1100330332969203Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Protein-protein interactions (PPI) play a very important role in almost every cellular processes.With the rapid advances in high-throughput experimental, biological experimental methods can directly and systematically detect protein interactions at the whole genome level for many organisms. In addition to the direct experimental data, a number of computational approaches have been proposed to predict the sets of interacting protein pairs. Unfortunately, current protein interactions detection via high-throughput experimental methods or prediction by computational methods are reported to exhibit high false positive and false negative "noises". At the same time, the false negative rate of the interaction networks has also been estimated to be high. In this dissertation, we propose a couple of computational algorithms to assess the reliability of interactions from the noisy data and then predict new interactions.The purpose of this study was to investigate protein interaction networks from the topological aspect, and to develop four effective computational methods to automatically purify these networks, i.e., to detect false positive interactions from the existing protein interaction networks and discover unknown false negative interactions by their topological properties. Finally, we presented a novel application of using PPI networks to reconstruct signaling pathway. The main works and contributions for this dissertation are introduced as follows.(1) The high-throughput experimental protein interaction data is prone to exhibit high level of false positive rates. A novel and effective approach was proposed to deal with this issue by integrating heterogeneous types of high-throughput biological data with weighted network topological metrics. We evaluate our proposed method on the Gavin's yeast interaction dataset. The experimental results show that by incorporating heterogeneous data types with weighted network topological metrics, our proposed method can improve functional homogeneity and localization coherence compared with those existing approaches.(2) A robust manifold embedding technique was developed for assessing the reliability of interactions and predicting new interactions, which purely utilizes the topological information of PPI networks and can work on a sparse input protein interactome without requiring additional information types. After transforming a given PPI network into a low-dimensional metric space using manifold embedding based on isometric feature mapping (ISOMAP), the problem of assessing and predicting protein interactions can be recasted into the form of measuring similarity between the points located in its metric space. Then a reliability index, a likelihood indicating the interaction of two proteins, was assigned to each protein pair in the PPI networks based on the similarity between the points in the embedded space. Validation of the proposed method was performed with extensive experiments on densely connected and sparse PPI network of yeast, respectively. The results demonstrate that the interactions ranked top by our proposed method have high-functional homogeneity and localization coherence. Particularly, our method is very efficient for large sparse PPI network with which the traditional algorithms fail. Therefore, the proposed algorithm is a much more promising method for detecting both false positive and false negative interactions in PPI networks.(3) A novel algorithm was proposed based on combination of line graph with weighted network toplogical metrics for eliminating false positive interactions from a PPI networks. A novel weighted line graph transformation method was firstly utilized to transform a PPI networks into line graph. Then, a number of network topological properties were computed. In order to define the similarity of proteins in the PPI network, a weighted Czekanowski-Kice distance metric was calculated on the basis of the obtained weighted PPI networks. Finally, the metrics wer used to assess the reliability of PPI data. The experimental results demonstate that by removing false positive protein interactions from the S.cerevisiae PPI networks, the reliability of the PPI dataset was significantly increased.(4) A computational systems biology approach was introduced for the accurate prediction of pairwise synthetic genetic interactions (SGI). First, a high-coverage and high-precision functional gene network (FGN) was constructed by integrating protein-protein interaction (PPI), protein complex and gene expression data. Then, a graph-based semisupervised learning (SSL) classifier was utilized to identify SGI, where the topological properties of protein pairs in weighted FGN was used as input features of the classifier. We compared the proposed SSL method with the state-of-the-art supervised classifier, the support vector machines (SVM), on a benchmark dataset in S. cerevisiae to validate the ability of our method to distinguish synthetic genetic interactions from non-interaction gene pairs. The experimental results show that the proposed method can accurately predict genetic interactions in S. cerevisiae.(5) A systems biology method was introduced to study the Drosophila Melanogaster MAPK signaling pathways by combining RNA interference (RNAi) technology, Fluorescence microscopy with automated image analysis. A high-quality functional gene network (FGN) was firstly derived by integrating high-content screen HCS and heterogeneous genomic data using a linear SVM classifier. Then the FGN was analyzed and the MAPK pathway was extracted using an extended integer linear programming. We validate our results and demonstrate that the proposed method achieves full coverage of components deposited in KEGG for the MAPK pathway. Interestingly, we retrieved a set of additional candidate genes for this pathway which are consistent with those published literatures.
Keywords/Search Tags:Protein-Protein Interactions, Complex Network Theory, Semi-supervised Learning, Manifold Learning, Genetic Interaction, Heterogeneous Data Sources Integration, Signaling Transduction Pathway, False Positives, False Negatives
PDF Full Text Request
Related items