Relationship Between Prediction Results Of Machine Learning-based Protein-protein Interaction And Sample Repeatability

Posted on:2016-03-03

Degree:Master

Type:Thesis

Country:China

Candidate:T Cheng

Full Text:PDF

GTID:2180330479993919

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

Protein-protein interaction(PPI) plays a significant role in many biological processes and is the basis for all living organisms to keep their regular physiological functions. Though numerous data on PPI have been accumulated using biological methods, the experimental results always present different levels of false-positive and false-negative. In recent years, the computational approaches represented by machine learning have been widely applied in PPI prediction, and many researchers have utilized machine learning to predict it with high accuracy being obtained. In fact, the prediction accuracy of most methods is experimentally proven to be overrated, in addition, the prediction results will be affected by datasets.This paper used multiple machine learning algorithms, combining with the auto covariance sequence coding method, to predict the human PPI and study the effect of the protein repeatability in the datasets on prediction results. First, positive datasets with various protein repeatability were constructed using the mining principle of dense graph and sparse graph as well as the maximum matching algorithm of the simple graph in graph theory, while the corresponding negative datasets with various protein repeatability were constructed by the graph adjacent matrix correlation calculation, and each positive dataset and its corresponding negative dataset were combined as the experimental datasets. Then the auto covariance sequence coding method was used to code these experimental datasets, and four machine learning algorithms, namely, C4.5, random forests, naive Bayes, k-Nearest Neighbors, were used to train and predict the coded data. Finally the prediction results were analyzed.The results showed that different experimental datasets present different prediction accuracy and the accuracy increases with the increase of protein sample repeatability. Therefore, it is concluded that the PPI prediction results using machine learning will be affected by the sample repeatability, to be specific, the higher the sample repeatability is, the more accurate the prediction results will be; the effect of sample repeatability on prediction results should be considered when machine learning is used to predict the PPI with sample repeatability problems.

Keywords/Search Tags:

protein-protein interaction, machine learning, accuracy, repeatability, dataset

PDF Full Text Request

Related items

1	Research On Predicting Protein-protein Interactions Based On Machine Learning
2	Research On Machine Learning-based Protein-Protein Interaction Extraction
3	Researches On The Protein-Protein Interaction Prediction Method Based On The Machine Learning
4	Research On Protein Complex Accurate Recognition Based On Machine Learning
5	Prediction Of Protein Structure And Function With Machine Learning Methods
6	Research On Identification And Application Of Protein Complexes In Protein-Protein Interaction Networks
7	The Study Of The Method For Predicting Protein-protein Interactions
8	Protein-Protein Interaction:Simple Prediction Tool Development And Studies On Specific Cases
9	Predicting Protein-protein Interactions Based On Machine Learning Algorithms Using Logistic Regression Model To Improve Accuracy Of Peptide Identification In Mass Spectrometry Analysis
10	Machine Learning Applications in Genomics, Protein Folding and Protein-Protein Interaction