Font Size: a A A

Serveral Key Problems In Protein Structure Prediction

Posted on:2010-05-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:J F GuFull Text:PDF
GTID:1100360275458047Subject:Engineering Mechanics
Abstract/Summary:PDF Full Text Request
The exponential growth of molecular sequence data started in the 1980s when methods for DNA sequencing became widely available,which provides abundant resources for researchers to study the biological roles of proteins.To a great extent,the biological role of a protein is determined by its three dimensional(3D) structure.With the development of experimental techniques to obtain native 3D protein conformations,more and more protein structures have been determined,but the growth speed of protein structures is far behind that of protein sequences.Therefore,how to narrow the huge gap between the number of protein sequence and structure has become a critical task in molecular biology.With the development of computational techniques,computational biology,or called bioinformatics,as a new subject, has provided an approach for solving this problem,the theorical foundation of which is that all the information required for determining a tertiary protein structure is contained in the corresponding sequence.Based on this theory,protein structure prediction methods such as homology modeling,fold recognition and ab initio modeling were brought forward and applied widely in the past decades.This paper firstly introduces some basic theories and related knowledge of protein structure,and describes the realistic background and significance of protein structure prediction,then briefly introduces and discusses the present three major types of protein structure prediction methods:homology modeling,fold recognition and ab initio modeling. On this basis,several key problems are further studied in this paper,such as protein sequence analysis,protein fold recognition potential function and protein structure prediction in the "twilight zone" of sequence similarity.Considering the advantage of wavelet package in extracting local characteres of signal, we proposed a multiple sequence alignment method based on wavelet package transform. Wavelet package can accurately locate the similar regions,i.e.consever regions,among several sequences,which improves the accuracy and decrease the time consuming of multiple sequence alignment.With benchmark "BAIiBASE" and "ROSE",this method is verified.The results demonstrate it has favorable efficacy and is an efficient multiple sequence aligment tool.A simplified protein fold recognition potential is proposed.The potential consistes of three simple components:residue contact interaction energy,hydrophobic energy and backbone pseudodihedral torsion energy,and only 212 parameters are needed to construct the potential.With a protein training set,the parameter set of this potential is determined by linear programming method.The protein native structure recognition ability of this potential is tested on several high quality test sets,and the results demonstrate it can separate the native structures of most proteins in the test sets.Comparing with other simplified potentials,its efficacy is one of the best,and has a wide application prospect.A protein fold recognition optimization method based on parametric evaluation function is presented in this paper.Parametric evaluation function can condense the complicated multi-objective and multi-constrait problem into a single-objective unconstrained problem, then solved with conjugate gradient method.The parametric evaluation function method decreases the optimization difficulty,and makes the solution away from the boundary of feasible region during the optimization process,which makes all the objectives optimized simultaneously.Tests on several standard test sets showes the quality of potential determined with parametric evaluation optimization method is improved when compared with potential determined with linear programming method.We has developed and designed a genetic threading program,which is presented at the end of this paper.Its energy function is more physics-based,and consists of six energy components.The including of pairwise contact interaction makes the match between target sequence and template structure a NP-complete problem.Genetic algorithm is a global heuristic method,and has good search ability.Tests on the Fischer benchmark show the proposed genetic threading method has favorable fold recognition ability and alignment accuracy.In addition,the correlation between alignment accuracy and fold recognition result demonstrates the rationality of the adopted energy function.We gratefully acknowledge financial support for this work from the National Natural Science Foundation(grants 10772042),the National Basic Research Program of China(grants 2004CB518901) and High Science and Technology(grants 2006AA01A 124) of China.
Keywords/Search Tags:Protein, Sequence alignment, Potential function, Fold recognition, Protein structure prediction, Threading
PDF Full Text Request
Related items