Font Size: a A A

Sequential Monte Carlo and Dirichlet mixtures for extracting protein alignment models

Posted on:2005-12-06Degree:Ph.DType:Dissertation
University:Stanford UniversityCandidate:Logvinenko, TanyaFull Text:PDF
GTID:1450390008992484Subject:Statistics
Abstract/Summary:PDF Full Text Request
In this dissertation we present various methods that can be used for aligning a pair of protein sequences or for finding similarities between multiple sequences. Commonly used non-Bayesian methods for aligning biological sequences often produce alignments which maximize some scoring function. However, the choice of the model parameters can strongly influence the resulting alignment. In addition, in the absence of a statistical model significance of the produced alignment can not be assessed. To address these issues we introduce formulation of the sequence comparison problem in Bayesian terms. Two Bayesian methods for aligning a pair of protein sequences are described and implemented. A rule for assessing significance of the resulting alignments is prescribed. For aligning multiple protein sequences a novel Bayesian method is proposed. Using Bayesian formulation of a problem and sequential Monte Carlo framework, the method progressively includes all sequences into the alignment. The resulting final alignment is improved by incorporating such Bayesian methodologies as Gibbs sampler and simulated annealing. Comparison study of the methods for biological sequence alignment (which uses the sets of protein sequences for which the true biological alignments are known) is presented. The novel Bayesian methods for pair-wise and multiple sequence alignment perform at least equivalent to or often better than the other methods for sequence alignment.
Keywords/Search Tags:Alignment, Protein, Methods, Aligning
PDF Full Text Request
Related items