Font Size: a A A

Research On Robustness Of Protein Folding/Unfolding Rate Prediction Model

Posted on:2019-07-29Degree:MasterType:Thesis
Country:ChinaCandidate:L L WangFull Text:PDF
GTID:2370330563997683Subject:Statistics
Abstract/Summary:PDF Full Text Request
One of the important research contents in modern molecular biology is to analyze protein folding mechanism by accurate prediction of protein folding rate.However,the complexity of the problem is far more than expected,becoming one of the important problems in molecular biology.In the study of complex problems,divide and conquer strategies are often used.Therefore,it is of great significance to study the protein folding by discovering the new factors that affect the protein folding rate.Current studies have found that the main factors affecting the protein folding rate are:the size of the protein(chain length)and structural topology.The folding rate of multi-state protein is mainly dominated by chain length,while the folding rate of two-state protein is mainly controlled by structure topology.In addition,there are authors based on the physicochemical properties of amino acids,and using machine learning algorithm to predict the protein folding rate,claiming to obtain better prediction results,indicating that the amino acid sequence composition is also affecting the protein folding rate of one of the factors.Our research team has presented a new parameter to study protein folding rate---cumulative backbone torsion angles(CBTA).Some studies have shown that there is a high correlation between CBTA and protein folding rate.To synthesize the existing research conclusions,we can get four main parameters for predicting protein folding rate,namely CBTA,absolute contacts order(ACO),protein chain length(L),and protein amino acid composition(AAC).We used CBTA,ACO,L and AAC as parameters,respectively,constructed univariate or multivariate linear regression models.On the newly collected experimental dataset of protein folding rate and unfolding rate,the prediction accuracy,robustness and generalization ability of these models are analyzed by means of jackknife or cross test respectively.The result of comparison is that,from the view of prediction accuracy,the CBTA>ACO>L>AAC,the CBTA>ACO>L>AAC from the robustness,the model CBTA,the ACO and the L three models have roughly equivalent generalization ability,but they are significantly superior to the generalization ability of the AAC model.The results show that the model CBTA has the best predictive accuracy,robustness and generalization ability in the current dataset compared with the other three models.At present,the prediction of protein folding rate is restricted by the small amount of data,which is not favorable to the proposed machine learning model based on multifactor.The existing results based on multifactor models should be validated carefully,as there may be a high risk of overfitting.In addition,the protein folding kinetics types were identified in the current protein folding rate dataset.The cumulative backbone torsion angles(CBTA)and the relative contact order(RCO)are combined into a feature vector,and the classification of protein folding dynamics is identified by logical regression(LR)and support vector machine(SVM)classification algorithm,respectively.The recognition accuracy of the SVM algorithm is 0.839,and the LR algorithm is 0.830.
Keywords/Search Tags:Protein folding rate, Cumulative backbone torsion angles, Prediction model, Robustness
PDF Full Text Request
Related items