Font Size: a A A

High resolution force fields and residue contact prediction models for protein structure prediction

Posted on:2010-01-20Degree:Ph.DType:Thesis
University:Princeton UniversityCandidate:Rajgaria, RohitFull Text:PDF
GTID:2440390002488819Subject:Engineering
Abstract/Summary:
The process by which a protein acquires its stable and functional three dimensional structure is referred as the protein folding process. The understanding of protein folding process is one of the most important and challenging problems in computational biology. Various approaches to determine the three dimensional native structure of a protein from its amino acid sequence have been proposed. Some of these methods use existing experimentally-determined structures whereas, other ab initio methods do not rely on existing structures for their predictions.;Once the predictions have been made, it is very important to identify the best structure (most similar to the native structure) from an ensemble of predicted structures. Anfinsen's hypothesis states that the native structure corresponds to the global Gibbs free energy minima. A linear programming based model, that uses this hypothesis as the main criterion, has been developed to generate a Calpha-Calpha distance dependent force field. A diverse protein set and an improved decoy generation technique was employed to generate a challenging set of high quality training decoys. The Calpha-Calpha distance dependent force field generated using this model was found to be very successful in selecting native structures from an ensemble of high resolution conformers. Another linear programming based model has been developed to generate a side chain centroid distance dependent force field that includes the presence of side chain atoms of a residue. This force field was found to be more successful in discriminating between the native and non-native structures of a protein. These force fields can also be used for fold recognition and de novo protein design.;Protein structure prediction using first principles methods is very difficult and challenging because of the enormity of the conformational search space that needs to be searched. Any information that can reduce the conformational search space can potentially make the structure prediction method more efficient. In this thesis, two models have been developed to predict contacts between non-local residues of a protein. The first model uses an integer linear optimization formulation to predict non-local hydrophobic contacts of an alpha-helical protein. The second model also uses an integer linear optimization formulation to predict contacts between non-local residues of beta, alpha + beta, and alpha/beta proteins. The predicted contacts can be used to generate distance bounds between contacting residues. These bounds can prove very useful for first principles methods like ASTRO-FOLD to reduce the protein conformational search space. The problem of improving the accuracy of predicted contacts has been addressed by generating an optimal set of filters. The selection of optimal filters has been formulated as an integer linear optimization problem. The usefulness and effectiveness of proposed models for tertiary structure prediction has been tested and validated using a set of test proteins including test cases from blind protein structure prediction experiments.
Keywords/Search Tags:Protein, Structure, Force field, Model, Conformational search space, Integer linear optimization
Related items