Font Size: a A A

Statistical methods for analyzing human genetic variation in diverse populations

Posted on:2013-02-02Degree:Ph.DType:Dissertation
University:University of MichiganCandidate:Wang, ChaolongFull Text:PDF
GTID:1453390008966570Subject:Biology
Abstract/Summary:
The recent expansion of genetic datasets in diverse populations has allowed researchers to investigate human genetic structure and evolutionary history with unprecedented resolution. The huge amount of data also poses new statistical challenges, in both quality control and data analysis. In this dissertation, I develop statistical methods to address some challenges arising from recent population-genetic studies, and apply the methods to study the geographic structure of human genetic variation.;First, I develop a method to correct for allelic dropout, a common source of genotyping error in microsatellite data. Traditional solutions for allelic dropout often require replicate genotyping, which is costly and often impossible in population-genetic studies. To address this problem, I propose a maximum likelihood approach to estimate dropout rates from nonreplicated microsatellite genotypes. Based on simulations and empirical data, I show that this method is both accurate and fairly robust to some violations of model assumptions.;Next, I introduce a Procrustes analysis approach to compare spatial maps of genetic variation. Multivariate techniques, such as principal components analysis (PCA), have been widely used to summarize population structure, typically in two-dimensional maps, which often resemble the geographic maps of sampling locations. Using the Procrustes approach, I quantitatively demonstrate that genetic coordinates based on SNPs and CNVs are similar to each other, and are highly concordant with the geographic coordinates.;Finally, applying PCA and Procrustes analysis on SNP data from worldwide populations, I perform a systematic study to compare genes and geography across the globe. By considering examples in different regions, I find that significant similarity between genes and geography exists in general. Further, the similarity is highest in Asia and once isolated populations have been removed, Sub-Saharan Africa. The results provide a quantitative assessment of the geographic structure of human genetic variation worldwide.;In summary, this dissertation contributes both statistical tools for analyzing large-scale genetic data and biological insights on the spatial patterns of human genetic variation. Results from this dissertation provide a basis for evaluating the role of geography in giving rise to human population structure, and can facilitate statistical methods for inferring individual geographic origin from genetic variation.
Keywords/Search Tags:Genetic, Human, Statistical methods, Structure, Populations, Data, Geographic
Related items