Font Size: a A A

Comparing and modeling protein structure

Posted on:2005-02-21Degree:Ph.DType:Dissertation
University:Stanford UniversityCandidate:Kolodny, RachelFull Text:PDF
GTID:1450390008987406Subject:Computer Science
Abstract/Summary:
Proteins are remarkably versatile macromolecules involved in essentially all biological processes. The detailed three-dimensional structure of a protein encodes its function. A fundamental computational challenge in the study of proteins is the comparison and modeling of protein structure. Structural similarities of proteins can hint at distant evolutionary relationships that are impossible to discern from protein sequences alone. Consequently structural comparison, or alignment, is an important tool for classifying known structures and analyzing their relationships. Efficient models are crucial for structure prediction; in particular, for the generation of decoy sets (ab initio protein folding) and loop conformations (homology modeling).; The first part of this work focuses on protein structural alignment, namely, the comparison of two structures. We formalize this problem as the optimization of a geometric similarity score over the space of rigid body transformations. This leads to an approximate polynomial time alignment algorithm. Our result is theoretical, rather than practical: it proves that contrary to previous belief the problem is not NP-hard. We also present a large-scale comparison of six publicly available structural alignment heuristics and evaluate the quality of their solutions using several geometric measures. We find that our geometric measure can identify a good match, providing a method of analysis that augments the traditional use of ROC curves and their need for a classification gold standard.; In the second part, we present and use an efficient model of protein structure. Our model concatenates elements from libraries of commonly observed protein backbone fragments into structures that approximate protein well. There are no additional degrees of freedom so a string of fragment labels fully defines a three-dimensional structure; the set of all strings defines the set of structures (of a given length). By varying the size of the library and the length of its fragments, we generate structure sets of different resolution. With larger libraries, the approximations are better, but we get good fits to real proteins with less than five states per residue. We also describe uses for these libraries in protein structure prediction and loop modeling.
Keywords/Search Tags:Protein, Structure, Modeling
Related items