Font Size: a A A

Research On Protein Structural Class Prediction Methods Based On Multi-future Information Fusion

Posted on:2016-04-21Degree:MasterType:Thesis
Country:ChinaCandidate:J J CaoFull Text:PDF
GTID:2180330467973402Subject:Biology
Abstract/Summary:PDF Full Text Request
Thanks to advances in sequencing technology, the number of sequences in publicdatabases grows exponentially, but their structure and function data are not growingnearly as fast. Although we can determine protein structure and function byexperiments, but it does not meet the requirements of the increasing protein sequencedata due to time consuming. Therefore, it is necessary to develop computationalmethod to study the relationships among protein sequence, structure and function.Protein structural class not only provides us with a distribution of structural elementsin high-dimensional structures, but also makes an increasing contribution to the studyof protein structure and function. Therefore, protein structural class is the initial studyof protein structure and function, and has important implications for proteomicsresearch. This paper focuses on prediction methods for protein structural classes, andthe main contents are summarized as follows:First of all,in this paper,we reviewed the available feature extraction methods,such as composition information, physic-chemical information and structureinformation. Meanwhile, we introduced three widely used machine learningalgorithms SVM, neural network and K-NN. Feature extraction and classificationalgorithm are two important aspects of protein structural class prediction, which aretheoretical bases and the premise of application for this study.Secondly, we clustered20amino acids into nine non-overlapping subsets usingsubstitution matrix and ranking algorithm. With help of these subsets, we transformeda protein sequence into a simple one, resulting in reduction of computationalcomplexity. Since the distribution of amino acids is random, we defined a positionfunction and further analyzed its probability distribution. We then described theposition distribution of amino acid by calculating their numerical characteristics andschemed out a new prediction method using both position-based sequence informationand11secondary structural features. We evaluated the proposed method with fourexperiments and compared it with the available competing prediction methods. Theresults indicate that the proposed method achieved the best performance among the evaluated methods. Its overall accuracy is up to84.6%-95.7%,1.4%-6.1%higher thanthe existing best-performing method. The quantitative analysis verifies that thepredicted secondary structural features out performs the content-position features, andthey make complementary contributions to each other. Thus, their combination is apromising way to improve the predictions of protein structural classes.Finally, we got information position specific scoring matrix (PSSM) with help ofPSI-BLAST profiles. We then designed an algorithm to simply the structure of PSSMand reserve its maximal information. Using auto covariance transformation, wemeasured the global structural properties of Red-PSSM among different columns.Taking structural properties of Red-PSSM and position-based structural features intoaccount, we proposed a novel scheme to predict protein structural classes. Wediscussed the influence of the reduced classes and numbers of space position onprediction model in term of prediction accuracy. Results show that the predictionaccuracy increases as the increasing of reduced class, but it reduces as the increasingof space position. Through the optimization, we found that reduced PSSM based onthe reduced class13and interval of2positions achieves the best performance, whichis consistent with the average interval theory of spiral and folded conformation.
Keywords/Search Tags:Protein structural class prediction, support vector machine, featureextraction, information fusion
PDF Full Text Request
Related items