Font Size: a A A

Research On Fold Recognition Based On Sequence Information

Posted on:2020-12-31Degree:MasterType:Thesis
Country:ChinaCandidate:Y H ZhongFull Text:PDF
GTID:2370330590973928Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the in-depth study of molecular biology,various types of biological data are also exponentially growing.However,due to human,material and current state of the art,protein structure and function information grows slowly.Prediction of protein folding patterns is a key step in protein structure and function prediction.Although the traditional biological experiments can accurately measure the pattern of protein folding,it is time consuming and expensive.Therefore,it is necessary to use the sequence information of a large number of existing proteins and combine the machine learning methods to study them.In this paper,using protein sequence information,genetic information and hierarchical information,combined withavariety of machine learning classification algorithms,we didin-depth exploration and research of predicting the folding pattern of proteins.The specific research content is as follows:The classification of proteins is hierarchical,but protein hierarchical information is less utilized in the existing methods.This paper proposes to use structured support vector machines to add protein hierarchical information to the classification process.This paper selects the characteristics commonly used in three fields for comparative experiments: ACC-PSSM,RPSSM,MEDP.The comparison experiment results show that after using the structured support vector machine,the recognition accuracy is improved by 2.7%-6.4% compared with the ordinary support vector machine.After integrating the three features with a simple summation strategy,the accuracy rate reached 69.0%.The validity of hierarchical information in protein folding recognition was verified.Feature extraction plays an important role in protein folding recognition.The feature extraction method based on position-specific matrix(ACC-PSSM)is adopted by many classification methods in the field of protein folding recognition,and its performance is outstanding.Therefore,this feature is re-optimized to improve its performance.For this reason,three optimization schemes for ACC-PSSM are proposed,and the experimental results are compared and summarized.After an in-depth analysis of the experimental results,it was found that the modified ACC-PSSM showed worse overall performance,but their classification ability in some categories was significantly better than the original ACC-PSSM.Based on this,we speculate that there is a complementary relationship between them.Inorder to utilize the complementary relationship between them and give full play to the advantages of each feature,this paper proposes an optimization strategy based on the selection of optimal sub-methods: decomposing the folding recognition multi-classification problem into two-class problem and selecting a sub-classifier for each two-classifier.The experimental results show that the optimization strategy based on the selection of the optimal sub-method is very effective,and the correct rate is 78.3%,which is superior to all single-class classifiers.Convolutional neural networks can automatically learn information that is useful for classification through a large amount of data,so we take into account the training of neural networks with larger datasets,and then use trained neural networks to extract features for small datasets,lastly using traditional classification algorithms for classification.In order to achieve higher accuracy,we combines the methods and features in Chapter 3 with the features extracted by convolutional neural networks,and finally increases the accuracy to 79.4%.The feasibility of using convolutional neural networks to extract features on protein folding recognition is discussed.
Keywords/Search Tags:protein folding recognition, optimal submethod, hierarchical classification, convolutional neural network
PDF Full Text Request
Related items