Font Size: a A A

Relationship Between Prediction Results Of Machine Learning-based Protein-protein Interaction And Sample Repeatability

Posted on:2016-03-03Degree:MasterType:Thesis
Country:ChinaCandidate:T ChengFull Text:PDF
GTID:2180330479993919Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Protein-protein interaction(PPI) plays a significant role in many biological processes and is the basis for all living organisms to keep their regular physiological functions. Though numerous data on PPI have been accumulated using biological methods, the experimental results always present different levels of false-positive and false-negative. In recent years, the computational approaches represented by machine learning have been widely applied in PPI prediction, and many researchers have utilized machine learning to predict it with high accuracy being obtained. In fact, the prediction accuracy of most methods is experimentally proven to be overrated, in addition, the prediction results will be affected by datasets.This paper used multiple machine learning algorithms, combining with the auto covariance sequence coding method, to predict the human PPI and study the effect of the protein repeatability in the datasets on prediction results. First, positive datasets with various protein repeatability were constructed using the mining principle of dense graph and sparse graph as well as the maximum matching algorithm of the simple graph in graph theory, while the corresponding negative datasets with various protein repeatability were constructed by the graph adjacent matrix correlation calculation, and each positive dataset and its corresponding negative dataset were combined as the experimental datasets. Then the auto covariance sequence coding method was used to code these experimental datasets, and four machine learning algorithms, namely, C4.5, random forests, naive Bayes, k-Nearest Neighbors, were used to train and predict the coded data. Finally the prediction results were analyzed.The results showed that different experimental datasets present different prediction accuracy and the accuracy increases with the increase of protein sample repeatability. Therefore, it is concluded that the PPI prediction results using machine learning will be affected by the sample repeatability, to be specific, the higher the sample repeatability is, the more accurate the prediction results will be; the effect of sample repeatability on prediction results should be considered when machine learning is used to predict the PPI with sample repeatability problems.
Keywords/Search Tags:protein-protein interaction, machine learning, accuracy, repeatability, dataset
PDF Full Text Request
Related items