Font Size: a A A

Maximum likelihood and Bayesian methods for studying selection using DNA sequence data

Posted on:2002-09-02Degree:Ph.DType:Thesis
University:Harvard UniversityCandidate:Bustamante, Carlos DanielFull Text:PDF
GTID:2460390011498912Subject:Biology
Abstract/Summary:
This thesis is a study of maximum likelihood and Bayesian methods for analyzing parametric statistical models of natural selection in population genetics and phylogenetics. It focuses four main topics: (A) The first topic is the relationship between amino acid polymorphism, protein structure, and purifying selection. We explore this question through the use of multivariate logistic regression models of amino acid size, physicochemical class, solvent accessibility, and secondary structure. The methods are applied to several Escherichia coli and Salmonella enterica proteins. Model selection in a likelihood framework is discussed. (B) The second topic is selection on “silent” sites in functional genes and pseudogenes of Humans and Murids. We explore this issue by developing a new likelihood method for detecting constrained evolution at synonymous sites and other forms of non-neutral evolution in putative pseudogenes. Two likelihood ratio tests are developed to test the hypotheses that (1) a putative pseudogene evolves neutrally and that (2) the rate of synonymous substitution in functional ortholog of a pseudogene equals the rate of substitution in the pseudogene. The method is applied to a data set containing 120 human and rodent pseudogenes. (C) The third topic is a study of statistical properties of the maximum likelihood estimates of the selection and mutation parameters in a Poisson Random Field population genetics model of directional selection on DNA. I derive the asymptotic variances and covariance of the mle's and explore the power of the Likelihood Ratio Tests (LRTs) of neutrality for varying levels of mutation and selection as well as the robustness of the LRT to deviations from the assumption of free recombination among sites. (D) The third topic is hierarchical Poisson Random Field population genetic models of DNA polymorphism and divergence. The goal of the research is to estimate the mean and variance of the distribution of selective effects for different classes of mutations. An EM algorithm for maximum likelihood estimation is described and implemented. Likewise, a Markov Chain Monte Carlo methods for sampling from posterior distributions in Bayesian models is discussed and used to explore a test data set generated under the model.
Keywords/Search Tags:Maximum likelihood, Selection, Bayesian, Methods, DNA, Models, Explore
Related items