Font Size: a A A

Research On Classification Methods Of Critical Nodes In Protein Interaction Networks Based On Support Vector Machine

Posted on:2015-03-07Degree:MasterType:Thesis
Country:ChinaCandidate:Y L ZhaFull Text:PDF
GTID:2180330467477741Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Proteins are the material basis of life, the primary organisms of cells and main undertakers of life activities. Hence it is the substances with close ties to lives and various life activities. Different proteins possess differential effects on life activities, and then essential proteins are this kind of proteins which is vitally need for survivals and reproduction of biological tissues and normal executions of relevant specific biological functions of those tissues so that essential proteins are the critical roles in biological tissues’survivals. Identification and prediction of essential proteins is useful to grasp the process of cell growth cycles deeper, to enhance the understandings of internal operation mechanisms of life activities and also to promote the researches of biological evolutions. Therefore with the development of high throughout technology in proteomics era, the relevant experimental data are increasing of proteins day to day, and prediction of essential proteins in biological networks, such as protein interaction networks, has become a new researching hotspot graduallyThis paper has explored the critical characteristics of essential protein identification from the point view of fusion of the topological centrality measures in protein interaction networks on the basis of analysis of the classical topological centrality measures, and then designed a new effective essential protein identification method. The main research work of this paper can be summarized as follows:Firstly, we describe the basic concept briefly of the topological centrality measures which are the kind of indexes to depict the importance level of their roles in complex networks. The identification methods of essential proteins based on topological centrality measures can only reflect a single characteristic of nodes in protein interaction networks usually so as to they can not represent the full essentiality of proteins effectively up to the present. At the same time recent results report the essentiality of proteins possess has a multi dimensional and multi level property. Therefore we choose multi representative topological centrality measures and design a reasonable fusion mechanism of them, and then propose the construction of feature spaces according to topological centrality measures.Secondly, we convert sorting and screening mode of the existing essential protein identification methods to the classification task in the feature space constructed by topological centrality measures. Then we consider this kind of the classification task belongs to two-class classification based on statistical analyses of nodes and interactions in protein interaction networks. Thus we take use of support vector machine as a suitable classification method whose research production are relative abundance in order to our new predicting method, TC_SVM, of essential proteins in network level which provides a new way for researching identification of essential proteins. Experiment results in this paper indicate that the performance of TC_SVM is better than classical ten reference topological centrality measures generally according to comparative analyses in statistic indexes, especially its advantages are obvious of indexes of F-measure and AUC who are to measure classified performance of prediction comprehensively. Through the performance analysis and comparison, the better performance of TC_SVM indicates it is feasible to fuse multi classical topological centrality measures rationally, and more to enrich relevant researches by means of construction for new identifications of essential proteins based on classification idea.To convert sorting and screening mode of the existing essential protein identification methods to a classification task in the feature space through the fusion of multi centrality measures proposed in this paper extends the research field of essential protein identification and improves the accuracy of prediction, and moreover provides a new valuable bioinformatics method for identification of essential proteins.
Keywords/Search Tags:protein interaction network, essential protein, topological centralitymeasure, imbalance classification, support vector machine
PDF Full Text Request
Related items