Font Size: a A A

An Optimal Principal Component Regression For Genomic Control In Genome-wide Association Analysis

Posted on:2020-03-15Degree:MasterType:Thesis
Country:ChinaCandidate:Y L WangFull Text:PDF
GTID:2370330590983471Subject:Aquaculture
Abstract/Summary:PDF Full Text Request
Owing to inflation of test statistics,confounding factors such as population stratification,family structure and cryptic relativeness,widespread distributed in genome-wide association analysis,which can increase the false positive rate of tests of association and decrease statistical power to detect quantitative trait nucleotides.Genomic control is a popular method used to measure the extent of inflation caused by those confounders and to correct confounding effects.There are a lot of methods of genomic control,such as structured association,inferring genetic ancestry,family-based association tests and mixed model association studies,and each has its own advantages.In contrast,mixed model association analysis is more comprehensive and efficient.High computationally intensive requirement renters application of mixed model association analysis for almost all polymorphic markers produced by whole genome resequencing with high efficiency and low cost in increasingly large experimental samples.Therefore,a series of simplified mixed model association analysis algorithms,such as GRAMMAR,EMMAX,FaST-LMM and BOLT-LMM have been proposed.Simple linear regression analysis of phenotype to the tested marker is the simplest approach in genome-wide association analysis.While linear model association analysis takes account into polygenic effects by regarding total genetic effects of high-throughput markers excluding the tested marker as the random effects.Spectrally transformation by realized relationship matrix takes random polygenic effects independent among individuals,and then it is inferred that linear regression model of all principal components for realized relationship matrix is equivalent to genomic mixed model.This suggests all principal components have ability to evaluate other confounders,in addition to population stratification.For the saturated regression model of all principal components(number of principal components is equal to sample size),we select a number of the top principal components to correct confounders according to genome-wide chi-squared mean(close to 1)or a satisfied Q-Q plot.We call this method as optimal principal component-based genomic control.Compared to principal components regression for correcting population stratification,this method can achieve the similar genomic control to genome-wide mixed model association analysis.It is easy extend to analyze binary traits,we indirectly realize the genome-wide mixed model association analysis for complex disease traits.Different from regular quantitative traits whose phenotypes can be directly corrected by principal components,binary traits are required to firstly choose the principal components by generalized linear model,and then define the regression of the selected principal components as a modified variable in generalized linear model for genome-wide association analysis.We demonstrate statistical utility of optimal principal component-based genomic control by computer simulations.Continuous quantitative traits and binary traits are simulated based on genomic datasets in mouse and maize,respectively.This concluded:(1)Under the condition of optimal genomic control,the optimal method perform the statistical power to detect QTNs similar to FaST-LMM algorithm,but the difference in statistical power increasingly closed as the increased number of QTNs and population size.(2)The optimal method remains the better genomic control than the FaST-LMM algorithm by using partial markers sampled randomly from whole genome to calculate realized relationship matrix.Genome-wide association analyses for growth traits in rainbow trout shows that there are 10,1 and 1 QTN related to body height,body length and body weight were detected by the optimal method.There are 13,0 and 0 QTN related to body height,body length and body weight were detected by the FaST-LMM algorithm.The results demonstrate:(1)The optimal method corrects confounders better than the FaST-LMM algorithm.(2)The FaST-LMM algorithm did not consistently identified more QTNs than the optimal method for all analyzed traits.
Keywords/Search Tags:genome-wide association analysis, genomic control, principal component regression, rainbow trout, mixed model association analysis
PDF Full Text Request
Related items