Font Size: a A A

Investigation Of The Effects Of Missing Call Bias And Estimation Of CNV Mutation Rate In Human Genome Analysis

Posted on:2011-10-30Degree:DoctorType:Dissertation
Country:ChinaCandidate:W Q FuFull Text:PDF
GTID:1220360305997424Subject:Genetics
Abstract/Summary:PDF Full Text Request
The technological revolution makes genetics into a new era called’-omics’. A large number of genetic data have been produced through the application of the microarray technology. In order to carry out in-depth data mining, methods from other fields, e.g. statistics and informatics, have been applied into the study of genetics. My Ph.D. work was summarized by two parts in this thesis, which illustrated how statistics is applied into the genetic researches.Ⅰ. We investigated missing call bias in high-throughput genotyping and its effects on further analyses. The advent of high-throughput and cost-effective genotyping platforms made genome-wide association (GWA) studies a reality. While the primary focus has been invested upon the improvement of reducing genotyping error, the problems associated with missing calls are largely overlooked. To probe into the effect of missing calls on GWAs, we demonstrated experimentally the prevalence and severity of the problem of missing call bias (MCB) in four genotyping technologies (TaqMan, SNPstream, Illumina Beadlab and Affymetrix Human Mapping 500K SNP array). Subsequently, we showed theoretically that MCB leads to biased conclusions in the subsequent analyses, including estimation of allele/genotype frequencies, the measurement of HWE and association tests under various modes of inheritance relationships. We showed that MCB usually leads to power loss in association tests, and such power change is greater than what could be achieved by equivalent reduction of sample size unbiasedly. We also compared the bias in allele frequency estimation and in association tests introduced by MCB with those by genotyping errors. Our results illustrated that in most cases, the bias can be greatly reduced by increasing the call-rate at the cost of genotyping error rate. The commonly used ’no-call’procedure for the observations of borderline quality should be modified. If the objective is to minimize the bias, the cut-off for call-rate and that for genotyping error rate should be properly coupled in GWA. We suggested that the ongoing QC cut-off for call-rate should be increased, while the cut-off for genotyping error rate can be reduced properly.Ⅱ. We proposed a novel statistical method to approximately estimate the mutation rate of copy number variants (CNVs). CNVs in the human genome were found to be contributing to both Mendelian and complex traits as well as genomic plasticity in evolution. The investigation of mutational mechanisms of CNVs and estimating their mutation rates are critical to understanding the etiology of the CNV-associated traits. Much progress has been made to unravel the mechanisms for CNV formation; however, the evaluation of their mutation rates at genome level poses an insurmountable practical challenge which requires large sample size and accurate typing. In this study, we showed that an approximate estimation of the mutation rates at CNVs could be achieved using population genotyping data. This estimation is sufficient to allow a comparison of mutation rates between CNVs across the genome for the purpose of identifying mutational hotspots. In the analysis of 4,330 CNVs from HapMap populations, we showed that the mutation rates of most CNVs are approximately at the order of 10-5 per generation, which is consistent with the observations in molecular assays. Notably, the mutation rates of 132 (3.0%) CNVs are at the order of 10-3 per generation, therefore, identified as hotspots. Further analysis revealed that the differences in genome architecture and rearrangement mechanism likely incited CNV hotspots in the human genome.In the near future, masses of data produced by next generation sequencing, will be emerge out. It is likely to unravel many unknowns about genetics. The analysis of such amounts of data relies on the assistance of statistics and informatics. Let us prepare for the advent of this golden time of biology.
Keywords/Search Tags:Single nucleotide polymorphism (SNP), Missing call bias (MCB), genotyping error, Copy number variant (CNV), mutation rate, Ancestral Recombination Graph (ARG)
PDF Full Text Request
Related items