Recognition Of Essential Proteins Based On Improved Edge Clustering Coefficient And K-nearest Neighbor Algorithm

Posted on:2016-03-22

Degree:Master

Type:Thesis

Country:China

Candidate:Y Q Jiang

Full Text:PDF

GTID:2180330467998802

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Complex network, revealing high complexity, is a network which is abstracted from thereal complex system. In the real life, we are living in different networks. In the meantime, thewhole society was covered by different networks, such as financial network, the internet,work network, friends network, transportation network, criminal network and so on.Therefore the further study of complex network has great significance to our life and work.Large amounts of proteins in organism fall into two categories according to theirimportance to the organism: essential proteins and non-essential proteins. The essentialproteins can help organism to implement some specific functions, and the loss of them willhave a tremendous impact on the organism. Thus it is of great significance for organism’ssurvival and normal work. To identify essential proteins from complex networks withcomputers has become a hotspot. Currently, there are some classical algorithms in this field,such as degree centrality, betweenness centrality, closeness centrality and so on. They justfocus on the importance of the nodes in the network, but ignore the importance of the edgeswhich are functioned as bridges between the connecting nodes. Then some researchersintroduced the edge clustering coefficient(ECC), proposed a new centrality algorithm(NC)，and an algorithm based on peeling sorting. However these algorithms has two commonproblems, they haven’t combined the dual characteristics of node and edge effectively, andhaven’t considered the advantages of each algorithm.For the above problems, we introduce the clustering coefficient(C) and the ECC. Then,based on ECC, we propose an improved edge clustering coefficient (IECC) and a new nodeand edge clustering method (NEC) based on IECC for essential protein prediction, whichintegrates both node and edge topological properties of protein-protein interaction network.Then we introduce the k-Nearest Neighbor (KNN) in machine learning. Considering thattraditional KNN leads to over-fitting or less-fitting frequently, we introduce Bootstrapre-sampling to improve KNN. The improved model for protein prediction is defined asBootstrap k-Nearest Neighbor model (Bootstrap-KNN). Bootstrap-KNN identifies each nodewith NEC and features from other essential protein identification methods in order to further improve the prediction performance. According to the applicability of different algorithms fordifferent network structures, our new methods can predict more objectively and applicably.In order to verify NEC and Bootstrap-KNN model, we use the yeast protein networks inDIP to simulate. Comparisons among multiple evaluation indexes shows that NEC is moreefficient than many other traditional methods on yeast protein-protein network and theBootstrap-KNN achieves better results, which may provide some instruction for essentialproteins detection in biology.

Keywords/Search Tags:

Complex network, key nodes, machine learning, k-nearest neighbor, Bootstrap-KNNmodel

PDF Full Text Request

Related items

1	Stock Price Prediction Based On Support Vector Machine And K-nearest Neighbor Algorithm
2	3D Mineral Prospectivity Modeling Based On Machine Learning
3	ROC Analysis Method Based On K-nearest Neighbor Classifier
4	Research Of The DNA Sequence Classification Algorithm Based On Machine Learning
5	Research Of Classification Algorithm Based On K Nearest Neighbor
6	Research On The Application Of Locally Sensitive Hashing In Network Representation Learning
7	Research And Methods Of Identifying Influential Nodes In Complex Networks Based On Deep Learning
8	Quantum Algorithm For K-Nearest Neighbors Classification Based On The Tensor Network Method
9	Research On Hash Learning Based Approximate Nearest Neighbor Search Method
10	An Improved Bayesian Model Applied To The Prediction Of Cytochrome P450 Enzyme-Substrate Selectivity