The Research Of Protein Secondary Structure Prediction Algorithm Based On Decision Forest

Posted on:2020-05-06

Degree:Master

Type:Thesis

Country:China

Candidate:Y P Li

Full Text:PDF

GTID:2370330575992718

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Protein is an important component of the human body and almost all activities in the body require the participation of proteins with specific functions.The spatial structure of a protein determines its primary function.Therefore,the study of protein structure helps to better understand its function.However,it is not possible to understand its spatial structure directly by simulating the folding process of proteins.However,proteins are composed of amino acid sequences.Therefore,it is a common method to predict the secondary structure of a protein by its amino acid sequence and to understand its three-dimensional conformation.In the era of rapid development of big data,cloud computing and artificial intelligence,the use of machine learning to predict the secondary structure of proteins has become a research hotspot in bioinformatics.Based on the decision tree forest model and machine learning technology,this paper deeply studies the eight types of secondary structure prediction of protein,the main research contents are as follows:Aiming at the problem of eight types of secondary structure prediction of protein,a decision forest prediction algorithm based on gradient lifting is proposed.The algorithm uses the second-order Taylor approximation of the cross-entropy loss function as the optimization target based on the PSSM spectral characteristics of the amino acid sequence.The mapping function determined by the decision tree is used as the optimization parameter,and the decision tree is constructed by greedily selecting the best split point on the eigenvalue.In addition,in order to prevent over fitting,₂L regularization term is further introduced in the objective function to control the complexity of the model.On the standard CB513 protein secondary structure evaluation data set,the proposed algorithm achieves 64.89%Q₈ accuracy.Aiming at the shortcomings of the gradient improvement decision forest algorithm running slow speed,this paper proposes a fast gradient lifting prediction model based on the histogram idea.The model discretizes the sample features by the histogram method.The data is sampled by a single-edge gradient technique for a large number of sample data,and the feature binding technique is used to reduce the dimension of the multi-dimensional features,realizing the two dimensions of sample size and feature.Parallel.Through a large number of experiments,the indicators affecting the performance of the model are analyzed.The experimental results show that theQ₈ accuracy of the test set is 66.35%based on the fast gradient lifting algorithm proposed in this paper.In addition,on the same data set,compared with other algorithms,the proposed algorithm runs very fast and the time complexity is very small.

Keywords/Search Tags:

Protein Secondary Structure Prediction, Sliding Window, Gradient Boosting, Decision Forest

PDF Full Text Request

Related items

1	Multi-Scale Encoding Of Amino Acid Sequences For Predicting Protein Interactions Using Gradient Boosting Decision Tree
2	Land Cover Classification Using The Combination Of Sentinel-2 Multi-temporal Data,Gradient Boosting Decision Tree And Random Forest
3	Study On Gradient Boosting Decision Tree And Its Improvement
4	Algorithm Research Of Protein Secondary Structure Prediction Based On Grouped Multi-Classifier
5	A Study On The Protein Secondary Structure Prediction And The Connection Between Protein Secondary Structure And Its 3D Structure
6	Peptide Fragment Ion Intensity Modeling Based On Gradient Boosting Decision Trees
7	Research And Application Of Optimizing Survival Analysis Method By Gradient Boosting Tree
8	Plant LncRNA-protein Interaction Prediction Using Deep Learning
9	Decision fusion for protein secondary structure prediction
10	The Research On Probabilistic Prediction Based On Natural Gradient Boosting(NGBoost)