Font Size: a A A

Prediction Of Protein Structural Class And Model Quality Evaluation Based On Machine Learning

Posted on:2018-06-11Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhangFull Text:PDF
GTID:2310330515460256Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Proteins are the basic organics that make up each cell.It is also the performer of life activity.The role of protein depends on its function,and protein function is mainly determined by their structure.Therefore,studying the structure of protein is very meaningful to understand its function.However,due to the complexity of the protein composition in the body,directly using molecular dynamics technology to simulate the protein folding process,not only requires a lot of computing resources,but also need researchers having a deep understanding of the protein folding process.It is difficult to quickly and accurately achievethe prediction of structure and model quality assessment.With the development of computer information technology,the research in prediction of protein structure and protein model quality assessment based on ML(Machine Learning)is a hotspot in the field of bioinformatics.The main contents of this thesis include the following three aspects:(1)To construct a multi-classification model of protein structural class based on attribute reduction.In the prediction of protein structural class,firstly,the pseudo amino acid composition(Pse AA)feature is extracted from amino acid sequence of the protein.The features that represent completely amino acid sequence information of the protein exists redundancy.Then,considering that the classification of the protein structural class is a multi-classification problem,Relief F algorithm is employed to reduce the redundancy.Next,the SVM multi-classifier model is constructed by recombining the outputs of several binary classifiers.Finally,the results ofclassificationare obtained.The experimental results are compared with other methods without reducting the feature attributes,the time-consuming of our method reduce by nearly half.Nevertheless,there is a problem that the model parameters are not determined easily.(2)Design SAPSO algorithm to optimize the parameters of protein structural classclassification model.Aiming at the problem that the parameters of the protein classification model are not easy to achieve,considering the characteristics that Annealing Simulated(SA)algorithm can jump out of the local optimal solution and that Particle Swarm Optimization algorithm had a fast convergence speed,the Simulated Annealing Particle Swarm Optimization(SAPSO)algorithm is proposed to obtain the optimized model parameters,and then through the specific experiments of protein classification,the results prove the effectiveness of the designed method.(3)As the traditional methods evaluated the quality of models without considering the source information,we proposed an accessing model to access the quality of model ofthe protein based on ML.The protein sequence is input into SWISS-MODEL,and its three-dimensional structure is automatically constructed.Then the Model1 sequence and the protein sequence are input into BLAST system,and the four main features of sequence alignment were extracted.Under the circumstances of considering homologous information,the extracted features are used as LS-SVM input data to train LS-SVM,and the parameters of LS-SVM are optimized by SAPSO algorithm.The GDT-TS of the protein is obtained from the LS-SVM model that constructed from the optimal parameter values.Then the experimental results show that the designed model has obvious advantages in absolute error and mean square error,which further proves the rationality and validity of the model.
Keywords/Search Tags:Protein structure class, SVM, LS-SVM, Multi-classification, Model quality, SAPSO
PDF Full Text Request
Related items