Font Size: a A A

Protein Structural Class Prediction Based On Feature Fusion

Posted on:2014-03-21Degree:MasterType:Thesis
Country:ChinaCandidate:G T ShaoFull Text:PDF
GTID:2250330425481034Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
At present the number of protein sequences rising sharp, but the number of proteins whichstructure is known is increasing slowly. Therefore, there is an urgent need to develop a rapidand accurate tool to predict the tertiary structure of protein. In the paper we study about proteinfeature extraction methods, machine learning algorithm and integrated algorithm to get a rapidand effective method of predicting protein tertiary structure.Classify protein tertiary structure with the machine learning algorithms is essentially apattern recognition problem. The basic hypothesis of our study is that tertiary structure ofprotein is determined by its amino acid sequence only and for the same kind of protein, itsamino acid sequence has some inherent regularity. The mathematical formula is hard to expressthese inherent laws. Using machine learning methods for protein tertiary structure prediction isa supervised learning process. Using the protein samples that structure are known to trainmachine learning methods such as neural network, support vector machine and bayesian neuralnetwork. Let them making scientific and reasonable judgment when meet proteins of unknowncategories.Just as other pattern recognition problems, feature extraction is a priority step of proteintertiary structure prediction. Feature extraction is a process of translating amino acid sequencedata into digital vectors of fixed dimension. Amino acids feature extraction is a very importantpart of the protein tertiary structure prediction. Feature extraction method has a crucial effect onthe prediction precision. There are many methods for protein feature extraction, such as aminoacid composition model(AA), polypeptide model, pseudo amino acid compositionmodel(PseAA), physical and chemical properties(PCC), recurrence quantificationanalysis(RQA), etc. In this paper, through the best-first strategy we selection new combinationof feature extraction: PCC and RQA, and make a fusion of these two characteristics.Protein tertiary structure prediction is multi-class prediction problem, so we need to buildmulti-class classification model. At present, general machine learning algorithms are aimed attwo kind problems classification, so when constructing multi-class classification models weshould considering how to translate multi-class problems into two types problem. Now thereare many multi-class classification models such as one-to-one, one-to-more, binary tree classification and error correcting output codes. Multi-class classification model needs morebase classifier, this paper we choose artificial neural network (ANN) and flexible neural (FNT)as the base classifier respectively. This paper we construct an error correcting output codesmodel and a binary tree classification model respectively. We achieved57.3%predictionaccuracy on1189data (set homology40%) with error correcting output codes model, withbinary tree classification model we achieved57.3%prediction accuracy on1189data and63.2%on640data (set homology25%). Prove that the work in this paper is effective.
Keywords/Search Tags:Protein tertiary structure prediction, Pattern recognition, machine learning, feature extraction, classification model
PDF Full Text Request
Related items