Font Size: a A A

Recognition Of Essential Proteins Based On Improved Edge Clustering Coefficient And K-nearest Neighbor Algorithm

Posted on:2016-03-22Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q JiangFull Text:PDF
GTID:2180330467998802Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Complex network, revealing high complexity, is a network which is abstracted from thereal complex system. In the real life, we are living in different networks. In the meantime, thewhole society was covered by different networks, such as financial network, the internet,work network, friends network, transportation network, criminal network and so on.Therefore the further study of complex network has great significance to our life and work.Large amounts of proteins in organism fall into two categories according to theirimportance to the organism: essential proteins and non-essential proteins. The essentialproteins can help organism to implement some specific functions, and the loss of them willhave a tremendous impact on the organism. Thus it is of great significance for organism’ssurvival and normal work. To identify essential proteins from complex networks withcomputers has become a hotspot. Currently, there are some classical algorithms in this field,such as degree centrality, betweenness centrality, closeness centrality and so on. They justfocus on the importance of the nodes in the network, but ignore the importance of the edgeswhich are functioned as bridges between the connecting nodes. Then some researchersintroduced the edge clustering coefficient(ECC), proposed a new centrality algorithm(NC),and an algorithm based on peeling sorting. However these algorithms has two commonproblems, they haven’t combined the dual characteristics of node and edge effectively, andhaven’t considered the advantages of each algorithm.For the above problems, we introduce the clustering coefficient(C) and the ECC. Then,based on ECC, we propose an improved edge clustering coefficient (IECC) and a new nodeand edge clustering method (NEC) based on IECC for essential protein prediction, whichintegrates both node and edge topological properties of protein-protein interaction network.Then we introduce the k-Nearest Neighbor (KNN) in machine learning. Considering thattraditional KNN leads to over-fitting or less-fitting frequently, we introduce Bootstrapre-sampling to improve KNN. The improved model for protein prediction is defined asBootstrap k-Nearest Neighbor model (Bootstrap-KNN). Bootstrap-KNN identifies each nodewith NEC and features from other essential protein identification methods in order to further improve the prediction performance. According to the applicability of different algorithms fordifferent network structures, our new methods can predict more objectively and applicably.In order to verify NEC and Bootstrap-KNN model, we use the yeast protein networks inDIP to simulate. Comparisons among multiple evaluation indexes shows that NEC is moreefficient than many other traditional methods on yeast protein-protein network and theBootstrap-KNN achieves better results, which may provide some instruction for essentialproteins detection in biology.
Keywords/Search Tags:Complex network, key nodes, machine learning, k-nearest neighbor, Bootstrap-KNNmodel
PDF Full Text Request
Related items