Font Size: a A A

Ab Initio Protein Design Based On Statistical Energy Function

Posted on:2016-12-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:P XiongFull Text:PDF
GTID:1220330467482424Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
The ultimate goal of protein design is to create new proteins that have desired function. To achieve this goal, we have to solve the problem of fixed backbone protein design first. The problem is defined by selecting suitable amino acid sequences that will fold into given tertiary structures. Over the past ten years, a number of remarkable progresses have been made in this field, but sequence design is still restricted by its low success rate. To overcome the current limitations, we developed a new statistical energy function (SEF) for ab initial protein design. The basic idea of SEF is to estimate the probability distribution of amino acid in each site and each contacting pair of residues given the design target based on information drawn from a non-redundant protein structure library. We take a database searching strategy with an adaptive criterion to ensure that we could simultaneously consider the influence of single residue property and relative orientation of the residue pair to the pairwise term, while minimize the effects of statistical errors. Further more, we have made detailed but critical improvements on quantifying the burial degree of residue positions and on correcting the small sample effects in the estimation of statistical probabilities. We eventually combined this statistical energy function with van der Waals energy terms, carried out de novo sequence designs, and validated the results theoretically as well as experimentally.For theoretical validations, we carried out single site redesign test, full sequence redesign test and structure prediction test on40native structure templates. Secondary structure predictions and tertiary structure predictions are carried out with these designed sequences, native sequences and sequences designed by RosettaDesign. The sequence identity between our designed sequences and native ones is30.3%, slightly higher than the sequences from RosettaDesign, which is29.5%. But we have much better tertiary structure prediction results than RosettaDesign. The proportion of predicted models that are highly to respective design targets is21.1%compared with the same value of5.8%for RosettaDesign.Finally, we select some of the designed sequences for experimental validation. One of the designs, Dv1ubq, folds as desired, and it is highly thermal stable with a melting temperature of123.3℃. Beside this protein, we employed a (3-lactamase-based system to evolve the foldablity of several other designed sequences. From relatively small random mutant libraries, we obtained three well-folded mutants. We have solved the solution structures of the designed protein Dvlubq and the mutant D1cy5M2using NMR. They are all in excellent agreement with respective design targets. The feedbacks from the experiments suggested aspects for further improvements of our protein design method.
Keywords/Search Tags:Protein design, statistical energy function, nuclear magnetic resonance, protein foldability, directed evolution
PDF Full Text Request
Related items