SNP-set Tests for Sequencing and Genome-Wide Association Studies | | Posted on:2015-12-03 | Degree:Ph.D | Type:Thesis | | University:Harvard University | Candidate:Barnett, Ian James | Full Text:PDF | | GTID:2473390020451174 | Subject:Biology | | Abstract/Summary: | PDF Full Text Request | | In this dissertation we propose methodology for testing SNP-sets for genetic associations, both for sequencing and genome-wide association studies. Due to the large scale of this kind of data, there is an emphasis on producing methodology that is not only accurate and powerful, but also computationally efficient.;In the Chapter 1, we aim at using extreme phenotype sampling to increase the power to identify rare variants associated with complex traits. We confirm both analytically and numerically that sampling individuals with extreme phenotypes can enrich the presence of causal rare variants and can therefore lead to an increase in power compared to random sampling. While application of traditional rare variant association tests to these extreme phenotype samples requires dichotomizing the continuous phenotypes before analysis, the dichotomization procedure can decrease the power by reducing the information in the phenotypes. To avoid this, we propose a novel statistical method based on optimal SKAT (SKAT-O) that allows us to test for rare variant effects using continuous phenotypes in the analysis of extreme phenotype samples. The increase in power of this method is demonstrated through simulation of a wide range of scenarios as well as in the triglyceride data of the Dallas Heart Study.;In Chapter 2, we present the higher criticism, a signal detection method that is effective for testing the joint null hypothesis against a sparse alternative, in the context of SNP-set testing. This test is useful for testing the effect of a gene or a genetic pathway that consists of d genetic markers. Accurate p-value calculations for the higher criticism based on the asymptotic distribution require a very large d, which is not the case for the number of genetic variants in a gene or a pathway. We propose an analytic method that accurately computes the p-value of the higher criticism test for finite d problems. Unlike previous treatments of the higher criticism, this method does not rely on asymptotics in d or simulation, and is exact for arbitrary d when test statistics are normally distributed. The method is also particularly computationally advantageous when d is not large. We illustrate the proposed method with a case-control genome-wide association study of lung cancer and compare its power to competing methods through simulations.;In Chapter 3, we adapt the higher criticism to better allow for correlation in the SNP-set. In Chapter 2, the SNPs in the SNP-set are first decorrelated, which loses power. We propose the generalized higher criticism (GHC) that does not require asymptotics in the number of SNPs in the SNP-set while simultaneously allowing for arbitrary correlation structures among the SNPs in the SNP-set. The detection boundary of the test is obtained, and the power of this method is compared with existing SNP-set tests over simulated regions with varied correlation structures and signal sparsity. The relative performance of these methods is also compared in their analysis of the CGEM breast cancer genome-wide association study. | | Keywords/Search Tags: | Genome-wide association, Snp-set, Method, Test, Higher criticism, Propose, Genetic | PDF Full Text Request | Related items |
| |
|