Font Size: a A A

Phylogenetic structural modeling of molecular evolution

Posted on:2009-12-04Degree:Ph.DType:Dissertation
University:Universite de Montreal (Canada)Candidate:Rodrigue, NicolasFull Text:PDF
GTID:1440390002993461Subject:Biology
Abstract/Summary:
The field of computational molecular biology is at an early stage. Despite major advances in producing and gathering large quantities of molecular data, the actual development of models capable of adequately explaining such data are still a far cry from a suitable level of realism. For instance, most phylogenetic models of molecular sequence evolution assume that each position of an alignment evolves independently of all other positions—a computationally motivated simplification well-known to be biologically unsound.;Our applications of these methods on real data indicates that considering sequence-structure compatibility requirements, as done here, leads to an improved model fit for all datasets studied. Yet, we find that the use of potentials alone does not suitably account for across-site rate heterogeneity or amino acid exchange propensities, and more work is needed to establish if richer forms of potentials, or other type of sequence fitness concepts, might better capture such features. In the meantime, the most favored models combine the use of statistical potentials with a suitably rich and well-posed site-independent model. We propose several avenues meriting further investigation, leading to a research expanse with possible impacts on phylogenetic inference, the detection and characterization of selective features, protein structure prediction, protein-protein interactions, and computational protein design.;Keywords. mlecular evolution; phylogeny; protein tertiary structure; statistical potential; Markov chain Monte Carlo; Bayesian statistics; phenomenological modeling; mechanistic modeling.;In this work, we explore different computational methods for the study of phylogenetic models that allow for a general interdependence between the amino acid positions of a protein, or between the codons of the associated gene. The models are focused on site-interdependencies resulting from sequence-structure compatibility constraints, using simplified molecular structure representations in combination with a set of statistical potentials, which are themselves derived from a protein database of resolved structures. This structural compatibility criterion defines a sequence fitness concept, and the methods developed can incorporate different site-interdependent sequence fitness measurements. We apply Bayesian methods of model selection and assessment—based on numerical calculations of marginal likelihoods, and posterior predictive checks—to evaluate evolutionary models encompassing the site-interdependent framework. Through our consideration of different levels of data interpretation (either focusing on amino acid sequences only, or focusing on coding nucleotide sequences), we propose the concept of phenomenological benchmarking, as a means of guiding and assessing mechanistic modeling strategies.
Keywords/Search Tags:Molecular, Modeling, Phylogenetic, Sequence
Related items