Font Size: a A A

Exploiting Fuzzy Spectral Clustering In Protein-Complex Detection

Posted on:2013-05-03Degree:MasterType:Thesis
Country:ChinaCandidate:D E NaFull Text:PDF
GTID:2230330374988780Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
One of the big challenges of molecular biology is to reconstruct the complete network of protein interaction within cells in a hope to shed unprecedented light on the inner working of the cellular machinery. Thanks to the recently developed high-throughput techniques, many protein-protein interactions have been discovered over the years and may databases have been created to store their relative information. Along with biological experiments, a wide variety of protein functional module detection algorithms have been designed.Graph-based functional module detection algorithms are one of the most widely used set of functional module detection algorithms. However, there is no reliable evidence that these methods can give biologically significant results. Due to the noisiness of the protein interaction networks, many false positives and false negatives may be introduced. Additionally, many of these techniques tend to ignore the multi-functionality of proteins.The main purpose of this dissertation was to develop a new protein module detection approach that tries to address the shortcoming of graph-based protein functional module detection algorithms and leverage their biological significance. In our study, we designed a migration strategy that enables proteins to migrate between clusters to finally get grouped with biologically similar proteins.Fuzzy c-means clustering algorithm was adopted as it fits well with the migration principle in addition it is ideal to describe the inherent uncertainty of biological networks. Besides, spectral clustering was used to get better precision in measuring the distances in the network and to cope with the high-data dimensionality. A study was performed on these techniques to understand their advantages and limitations to define some metrics that takes into consideration the biological and topological characteristics of proteins, in order to adapt the Fuzzy c-means and spectral clustering techniques to protein networks context.To investigate the impact of the different biological data on the distance calculation we defined three biological distances. The first one was only based on Gene Ontology (GO), the second was solely based on domain interaction information and the third one combined both kind of information.Tests were run on the Sacchromyces cerevisiae network in which we tried to improve the results of three widely used graph-based algorithms respectively:MCL, MCODE and DPClus. The obtained results showed that nice biological improvements have been achieved.
Keywords/Search Tags:system biology, protein interaction network, Domain-Domain interactions, Fuzzy sets, Spectral clustering, Algebraicgraph theory, clustering algorithms
PDF Full Text Request
Related items