Font Size: a A A

Methods for haplotype construction and their applications

Posted on:2009-01-03Degree:Ph.DType:Dissertation
University:University of California, Los AngelesCandidate:Ayers, Kristin LynnFull Text:PDF
GTID:1444390005954125Subject:Biostatistics
Abstract/Summary:PDF Full Text Request
Haplotypes are frequently used in association testing and can improve the power to detect a disease locus. The EM algorithm is a widely used method for haplotype frequency estimation in short regions showing linkage disequilibrium. The optimal size of these regions, referred to as a block or window, has come into question when imputing maternal and paternal haplotypes. We propose two methods to improve haplotype imputation. Chapters 2 and 3 describe a dictionary model for haplotyping and its applications. According to the model, a haplotype is constructed by randomly concatenating haplotype segments from a given dictionary of haplotype segments. The dictionary model produces a parsimonious list of overlapping haplotype segments, which may parallel what remains from full length ancestral haplotypes after recombination and mutation have broken them into smaller fragments. Likelihood evaluations rely on forward and backward recurrences similar to the ones encountered in hidden Markov models. Parameter estimation is carried out with the EM algorithm.;These estimated haplotype segments in the dictionary may be used to haplotype (or phase) individuals and estimate missing genotypes using an MCMC method. The true pair of haplotypes corresponding to a person's multimarker genotype is reconstructed using a Markov chain that visits haplotype pairs according to their posterior probabilities. The dictionary model yields expected counts of conserved haplotype segments, which can be used as genetic predictors in association testing.;Chapter 4 proposes a diversity penalty for the frequently used EM algorithm for haplotype frequency estimation. The standard EM algorithm for haplotype frequency estimation can accommodate the penalty if one passes over to a more general MM (minorize-maximize) scheme for estimation. Our MM algorithm can improve haplotype frequency estimation, haplotyping, and missing data imputation by enforcing parsimony in estimation of haplotype frequencies. The penalty automatically and quickly discards potential haplotypes with low explanatory power. Our new MM algorithm converges in fewer iterations, dramatically reduces the computational complexity of each iteration, and eliminates marginal haplotypes from further consideration. Imposition of the diversity penalty shows large decreases in computation times compared to naive application of the EM algorithm with modest improvement in haplotyping and genotype imputation.
Keywords/Search Tags:Haplotype, EM algorithm, Improve, Used
PDF Full Text Request
Related items