Font Size: a A A

Generation And Evaluation Of High-quality Structural Decoy Set In Protein Structure Prediction

Posted on:2015-05-31Degree:DoctorType:Dissertation
Country:ChinaCandidate:H Y DengFull Text:PDF
GTID:1220330467960379Subject:Theoretical Physics
Abstract/Summary:PDF Full Text Request
Protein plays vital roles in living organism, and is known as the material foundation for all vital movements. It consists of at least one chain of amino acids linked by peptide bonds (called protein sequence), and each chain folds into a specific spatial conformation. For understanding the functions of a protein at molecular level, the basic prerequisite is to know its three-dimensional structure. Nevertheless, how a protein folds into specific conformation from its sequence is yet an unsolved mystery. So far, the number of protein sequences in UniProt is nearly thousand times more than the number of them in PDB. This gap is still enlarging rapidly. Based on this reality, in the last two decades, protein structure prediction has attracted worldwide attention and achieved great development. Now it is one of the hottest field in computational biology.Protein structure prediction usually includes the following steps:conformation initialization, conformation search, structure selection and structure refinement, while energy functions almost involve in every step. The design and evaluation of energy function occupies a very important position in the development of protein structure prediction. There are two broad classes of energy functions. Ones are largely based on some aspects of the known physics of molecular interaction (known as physics-based energy functions). The others try to capture some aspects of the properties of protein native conformations (known as knowledge-based energy functions). The second one is more widely used. In terms of energy function evaluation, the most direct and effective way is to apply them into specific protein decoy sets. In our research, both design and evaluation of energy functions have been investigated, with the protein decoy set as a link. The main works are as follow:(1) Many knowledge-based energy functions take Bayes’ theorem or Boltzmann law as theoretical basis, which commonly involve the description of the observation state (native state) and the reference state. Even though the reference state is the key factor in differentiating different energy functions, it is difficult to judge what the best reference state is by the performances of original energy functions as they used different databases and parameter cutoffs. We aim to address this issue and evaluate the reference states by a unified database and programming environment. Six distance-specific atomic potentials with different reference states were constructed and applied to a series of protein decoy sets. The results show that the random-walk chain reference state performs relatively better than others, but no reference state can clearly outperform others in all decoy sets. Further analysis reveals that the statistical potential has a contradiction between universality and pertinence which is shaped by its reference state. Optimal reference state should be extracted from specific application environments and decoy spaces.(2) When evaluating energy function, we noticed that the majority of protein decoy sets are subject to specific structural problems which relate to secondary structure, structure compactness, solvent exposure and so on. It is also found that the existing decoy sets often have narrow RMSD distribution and many redundant conformations. All these issues make them cannot pose any challenge to well-designed energy functions. To end this embarrassing situation, we developed a template-based approach (3DRobot) for the generation of high-quality protein structural decoys. Given a protein native structure,3DRobot first identifies multiple templates by structure alignment and then runs replica-exchange Monte Carlo simulation on each template, finally selects desirable decoys and performs structural refinements. The3DRobot web-server is freely available to scientific community (http://zhanglab.ccmb.med.umich.edu/3DRobot’)(3) We selected200solved non-redundant protein structures for3DRobot to generate corresponding decoy sets(named3DRobot_set). In order to make a comprehensive comparison between3DRobot decoy sets and existing decoy sets, we also generated some decoy sets based on the same sets of proteins used in the existing decoy sets. As there is no precedent in systematically evaluating decoy sets, here we proposed and defined a series of evaluation criteria which are mainly derived from the common defects of the existing decoy sets. By virtue of these criteria, it is found that3DRobot decoy sets excel over the existing decoy sets both in the quality of individual decoys and decoy RMSD distributions. Moreover, the distance-specific atomic potentials we constructed previously were applied to all the decoy sets too, and the results further demonstrated the capability of3DRobot for energy function evaluation.
Keywords/Search Tags:decoy set, structural decoy, protein structure prediction, energyfunction, statistical potential, Monte Carlo simulation
PDF Full Text Request
Related items