Font Size: a A A

Protein Tertiary Structure Prediction Based On Hierarchical Classification Model And Integration Strategy

Posted on:2015-03-08Degree:MasterType:Thesis
Country:ChinaCandidate:Y M ChenFull Text:PDF
GTID:2180330431978609Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Prediction of protein tertiary structure is a core problem of proteomics. The solutions ofthis problem are helpful to mining of protein function, and understand the essence of lifephenomenon. The problem for prediction of protein tertiary structure has been intensively alarger amount of research interests, and some methods have been presented to solve thisproblem, in which machine learning is an effective tool in biological calculation and widelyused in this field. Prediction of protein tertiary structure using machine learning is based onuseful information extracted from the amino acid sequences. We can realize the prediction ofthe unknown amino acid sequence by analyzing the information, summed up the rule.Prediction of protein tertiary structure using machine learning includes three aspects: featureextraction, classification model building, and the integration strategy. The content of this papermainly includes the following aspects:In this paper, we mainly use Improvement Pseudo amino acid composition model andSection distance frequency. Improvement Pseudo amino acid composition model is to make aPrincipal Component Analysis (PCA) for the chemical and physical properties of amino acidto obtain the first three Principal components. Use the first three Principal components toreplace the three components of physicochemical composition. Section distance frequency isto make a division for amino acids. We use Distance frequency to extract feature informationfor every part of amino acid sequences. Recent studies have shown that single featureextraction may cause the loss of information. In order to improve the prediction accuracy, wemake a fusion of different characteristics.In this paper, we use Flexible Neural Tree (FNT) as the base classifier, select a hierarchicalclassification model, and put forward a new integration method to solve the problem ofprediction of protein tertiary structure. FNT is a machine learning method, which can optimizethe structure and parameters; Hierarchical classification is a multi-classification method;Integration strategy includes the following content: we use seven characteristics includesImprovement Pseudo amino acid composition model, Distance frequency, physicochemicalcomposition, the fusion of physicochemical composition and the Distance frequency, thefusion of Improvement Pseudo amino acid composition model and physicochemical composition, the fusion of Distance frequency and Improvement Pseudo amino acidcomposition model, Pseudo amino acid composition model. We use these characteristics tobuild seven base classifiers, and use selective ensemble algorithm based on genetic algorithmfor these seven base classifiers.Through comparing the experimental results with other methods, the prediction accuracyof the method in this paper is higher than other methods. So, our method is feasible andeffective in some extent.
Keywords/Search Tags:Protein tertiary structure prediction, Flexible Neural Tree, Hierarchicalclassification, Integration strategy
PDF Full Text Request
Related items