Font Size: a A A

Two-stage Design And Analysis For Genome-wide Association Studies

Posted on:2013-02-21Degree:DoctorType:Dissertation
Country:ChinaCandidate:D D PanFull Text:PDF
GTID:1114330374959565Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
Genome-wide association studies (GWAS) have emerged as an important tool for discovering susceptibility loci of complex diseases, and have successfully identified thousands of genetic variants (single-nucleotide polymorphism, SNP) associated with many human diseases. Compared with one-stage design that genotype all case-control samples on all loci, a well-constructed two-stage design (a fraction of the available case-control samples are genotyped on each SNP in stage1, based on association test results obtained at this stage, a small percentage of SNPs are selected and further genotyped on the remaining samples in the second stage) can substantially reduce genotyping workload and cost and hence is often adopted in GWAS. Replication-based analysis which considers data from each stage separately often suffers from loss of efficiency. Joint analysis strategy that combines the test statistics from both stages has been proposed to improve power. Existing joint analyses are based on statistics derived under an assumed known genetic model, however, in practice genetic models for SNPs with true association are usually unknown i.e. genetic model uncertainty, and thus might not have robust performance when the assumed genetic model is not appropriate.This thesis focus on robust single-locus test approaches for joint analyses in two-staged GWAS, and mainly includes three sub-topics as following. Firstly, we propose joint analyses based on two robust tests, MERT (maximin efficiency robust test) and MAX3(the maximum absolute values of Cochran-Armitage trend test under reces-sive, additive and dominant models) for common genetic variants with minor allele frequency (MAF) greater than5%. We obtain the large-sample asymptotic distribu- tion of test statistics for MERT joint analysis and give a computationally efficient parameter-bootstrap method to calculate p-value and power for MAX3joint analy-sis. The performances of the proposed approaches amongwith four reference methods (allele-based test joint/replication-based analysis and trend test under additive model-based joint/replication-based analysis) are investigated through the extensive simula-tion studies. The numerical results show that joint analysis is generally more powerful than replication-based analysis and the joint analysis on the basis of MAX3test statis-tic has the best overall power. We apply these approaches to a real data set for type-2diabetes mellitus, and find a new risk SNP based on p-value of MAX3joint analysis. Secondly, we propose replication-based analysis and joint analysis based on Beta test for rare genetic variants (MAF <5%). We prove that the p-value of Beta test converges in distribution to a standard uniform distribution. We evaluate the empirical type one error rate and empirical power of Beta test replication-based analysis and joint analy-sis, and the numerical results demonstrate that both two methods can properly control type one error rate and joint analysis is more powerful than replication-based analysis. We use the proposed two approaches to analyse a real example of rheumatoid arthritis and verify that the considerd SNP is significantly associated with rheumatoid arthri-tis. Thirdly, we propose a robust bayesian analysis based on asymptotic Bayes factor under two-stage GWAS, and define detection probability to evaluate the performance of asymptotic Bayes factor ranking methods. By extensive simulations, we compare the detection probability of maximum asymptotic Bayes factor joint analyis, genetic model averaging asymptotic Bayes factor joint analysis and additive model asymptotic Bayes factor joint analysis, and the results show that maximum asymptotic Bayes fac-tor joint analyis has most robust performance. We apply these three methods to a real data set, and the results indicate that maximum asymptotic Bayes factor ranking method can efficiently detect the association between recessive or dominant SNPs and diseases.The thesis is divided into six chapters as follows. The first chapter introduces some basic concepts and the research background. The second chapter presents prepara-tory knowledge of several testing statistics and methods for GWAS. The third chapter is devoted to two-stage design and analysis for common genetic variants. The fourth chapter discuss the two-stage design and analysis for rare genetic variants. The fifth chapter is devoted to two-stage design and analysis based on asymptotic Bayes factor. The sixth chapter makes a conclusion of the whole thesis and gives an outlook on the next-step work.
Keywords/Search Tags:Case-control study, Genome-wide association study, Single-nucleotidepolymorphism, Two-stage design, Genetic model unceitainty, Robust test, Joint anal-ysis, Rare variants, Beta test, Maximum asymptotic Bayes factor
PDF Full Text Request
Related items