Font Size: a A A

Statistical models and Monte Carlo methods for protein structure prediction

Posted on:2003-12-18Degree:Ph.DType:Dissertation
University:Stanford UniversityCandidate:Schmidler, Scott CurtisFull Text:PDF
GTID:1460390011481694Subject:Chemistry
Abstract/Summary:
As we enter the post-genome era, widespread availability of genomic data promises to revolutionize biomedicine, providing fundamental insights into the molecular mechanisms of disease and pointing the way to developing novel therapies. However important hurdles remain, including understanding the function and mechanism for the proteins encoded by genomic sequences. While function and mechanism are dictated by a protein's native structure, prediction of protein structure from sequence remains a difficult unsolved problem.; In this dissertation, I develop a novel framework for protein structure prediction from amino acid sequence, based on a new class of generalized stochastic models for sequence/structure relationships. I introduce a formal Bayesian framework for synthesizing the varied sources of sequence information in structure prediction using joint sequence-structure probability models based on structural segments. I describe a set of probabilistic models for structural segments characterized by conditional independence of inter-segment positions, develop efficient algorithms for prediction in this class of models, and evaluate this approach via cross-validation experiments on experimental structures. This approach yields secondary structure prediction accuracies comparable to the best published methods, and provides accurate estimates of prediction uncertainty, allowing identification of regions of a protein predicted at even higher accuracies.; I then generalize this Bayesian framework to models of the non-local interactions in protein sequences involved in tertiary folding. I develop Monte Carlo algorithms for inference in this class of models, and demonstrate this approach with models for correlated mutations in β-sheets. Case studies and cross-validation experiments demonstrate this approach for predicting β-strand contact maps, providing important information about protein tertiary structure from sequence alone.; This dissertation provides a suite of statistical models and computational tools for protein structure prediction. In addition, the models developed here generalize existing stochastic models in important ways. I relate these new models to existing generalized hidden Markov and stochastic segment models, showing the latter to be special cases of the former. Further, the interaction models developed here represent a novel class of stochastic models for sequences of random variables with complex long-range dependency structure. These new models, and the associated algorithms, are likely to be of broader statistical interest.
Keywords/Search Tags:Models, Structure, Statistical
Related items