Font Size: a A A

Study On The Methods To Protein Structure Prediction

Posted on:2003-03-29Degree:DoctorType:Dissertation
Country:ChinaCandidate:L X JinFull Text:PDF
GTID:1104360092980358Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
The prediction of protein structures and functions is an urgent task in the era of post-genome. This interdisciplinary field begs for knowledge of mathematics, computer science, information science, physics, system science, management science as well as biology. Concerning the problem of protein structure prediction, some researches and discussions are made in this dissertation. The main work is summarized as follows:1. The ab initio prediction of protein structure is to solve a global optimization problem per se, in which the first step is to build a mathematical model. Typically, the objective function in the model is a potential energy function, either physics-based potentials or statistic-based potentials. The characters of these two types of potential function are reviewed; the united residue force field, which combined the features of both, is scrutinized. Based on this research four optimization models are constructed. These models consist of different energy terms; therefore they can be used to evaluate the impacts of the terms on the predictions.2. In the prediction of protein structures, time consumption is one of the principal problems. In the traditional process of conformation searching, local energy minimization consumes about 95% of the computing time. Considering that the objective function of the prediction model is of multiple variables and multiple minima, an improved simulated annealing algorithm for continuous optimization problems is developed. This algorithm is advantageous over the former algorithms in that it can efficiently solve large-scale continuous global optimization problems (>3000 variables as demonstrated). When this algorithm is applied to the prediction of protein structures, the time-consuming energy minimization can be avoided; therefore the conformation searching process is speeded up. The structures of Met-enkephalin and Bovine Despentapeptide Insulin are predicted with this algorithm successfully. In addition, its convergence properties are proved with a concise approach.3. Most of the existing prediction methods for protein structural classes do not take into account the residue order along the protein sequences; therefore their predictive accuracies are limited. In this paper, a new prediction approach based on the subsequence distribution and the FDOD function is proposed. It is superiors to the former methods for that it includes the information of residue orders. Compared with the best performance published, it improved the predictiveaccuracies by 3.3% and 5.3% for two types of tests on the same data set. Moreover, it dose not use physicochemical parameters; it is fast and easy to implement. A data set under the limitation of 30% sequence redundancy is derived. Tests on this data set show that the new approach is sensitive to non-homologous proteins. It is also concluded that the length of subsequences affects the predictive results, especially for the proteins with mixed secondary structures. Single data set tests give the predictive accuracy of 73%.4. Support vector machine (SVM) is a new machine learning method. Its successful application to the prediction of protein subcellular locations demonstrates that it is more powerful than other approaches. In this paper an alternative method is developed based on the FDOD function and amino acid composition. The performances of the new method and the SVM method are compared. For Eukaryotic proteins the predictive accuracy by using the new method is about 2.6% higher than that by using SVM; for Prokaryotic protein an overall predictive accuracy of 89.9% is obtained. Based on the cell architecture, a hierarchical prediction scheme is constructed. This is a flexible prediction method, based on which the knowledge of the query sequence can be utilized and the predictive accuracy be improved as a result.5. The descriptors of amino acid sequences are summarized. How these descriptors influent the predictions of protein structural classes and subcellular locations are investigated. In the predict...
Keywords/Search Tags:Protein structure prediction, optimization model, simulated annealing, prediction of protein structural classes, FDOD function, prediction of protein subcellular locations, description of sequence characters
PDF Full Text Request
Related items