Font Size: a A A

A Multi-locus Jonckheere-terpstra Method For Genome-wide Association Study

Posted on:2017-02-26Degree:MasterType:Thesis
Country:ChinaCandidate:Q LiuFull Text:PDF
GTID:2370330518478227Subject:Crop Genetics and Breeding
Abstract/Summary:PDF Full Text Request
Most complex traits of humans,animals and plants are quantitative traits controlled by polygene.Studies on quantitative traits are of great significance to the genetic improvement of animals and plants,and to prevention and control of human complex diseases.Currently,the most commonly used quantitative trait research method is the genome-wide association studies.Applied researches show that the nonparametric test methods are effective complements to GWAS.Compared with the parametric methods of single marker scanning,the advantages of nonparametric test methods are particularly evident in cases where quantitative trait phenotype has a non-normal distribution and allele frequency is low.However,there are some problems in the currently applied nonparametric tests for GWAS,for example,the extremely high false positive rate,effects of the QTNs can not be estimated,and the detection power of the large effect QTNs is not as good as that of the parametric method of single marker scanning.In order to overcome these shortcomings,using the exact distribution properties of Jonckheere-Terpstra test statistics and multi-locus model can improve the detection power of QTNs and shrinkage estimation method not only can overcome the nonparameter GWAS method for high false-positive rate but also in estimation of QTN effect.Therefore,we propose the multi-locus Jonckheere-Terpstra test for GWAS method(mJTGWAS).The SNP genotypes derived from the Arabidopsis thaliana data were used to generate phenotypic values using 6 QTN along with various genetic background(none,polygene and epistasis).Each of the 1000 simulated samples was analyzed by Anderson-Darling test for GWAS(ADGWAS),effective mixed model association(EMMA),Jonckheere-Terpstra tests for GWAS(JTGWAS)and mJTGWAS.Then power,positions and estimated effects of the QTNs were acquired in order to confirm the effectiveness of the new method.Six flowering related traits of Arabidopsis thaliana and three important traits of maize were used to further confirm the effectiveness of the new method.The results are as follow:1)In the new method of mJTGWAS,firstly,the observed phenotype values were sorted by size after removing the node data,the first 50 and the last 50 of which were all sorted out(if there were less than 100 values after removing data node,then all of them should be used);then we made jonckheere-Terpstra exact test for the observed phenotype values and each SNP marker,and sorted each marker probability test p values in ascending order;and lastly,if the number of-markers with low P-values is 2 to 7 times as large as the sample size then they would be placed into the multi-locus genetic model,then through empirical bayesian method and likelihood ratio test we obtained the SNP associated traits and its related parameters.This is called the mJTGWAS method.2)Monte Carlo simulation studies indicated that the new method had 4.02%.18.69%and 2.32%higher average power in the detection of six QTN than ADGWAS,EMMA and JTGWAS respectively;7.84%?22*75%and 18.55%higher average power under polygenic background;0.7%,17.08%and 4.86%higher average power under epistasis background.Then,the new method had 0.02 and 0.47 lower average mean square error in the estimation of 6 Q-TN than mrMLM and EMMA respectively,0.01 and 0.50 lower mean square error under polygenic background,0.01 and 0.51 lower mean square error under epistasis background.Finally,the new method had 0.58%?0.02%and 0.30%lower false positive rate than ADGWAS?EMMA and JTGWAS respectively;0.71%,0%and 0.36%lower false positive rate under polygenic background;0.87%?0.01%and 0.56%lower false positive rate under epistasis background.In the simulation of no background,the computing time of new method(9.30 hours)is 59.47 less hours than EMMA.It indicated that new method can improve the power and accuracy of GWAS with high calculating efficiency.In addition,there is no need for Bonferroni correction of multiple hypothesis testing.3)The correlated traits of 6 flowering time in Arabidopsis thaliana were simultaneously analyzed by the new method and EMMA,while the analysis of 3 important traits of maize were performed with the new method and the ADGWAS.The results showed the number of significantly associated SNPs detected by new method is 51 and 36 respectively,which were 42 and 2 more than those detected by EMMA and ADGWAS respectively.The significantly associated SNPs were used to conduct a multiple linear regression analysis and the corresponding Bayesian information criteria(BIC).The new method shows the lowest BIC values.The number of genes previously reported to be associated with Arabidopsis traits in the proximity of the SNPs detected by the new method is 31,which was 25 more than that of EMMA.The number of QTLs previously reported to be associated with maize traits in the proximity of the SNPs detected by the new method is 5,which was 1 more than that of ADGWAS.It indicates that the SNPs detected by new method fit the data better than the other methods.In additional,it also indicates that new method is more powerful in the detection of genes than the other methods.
Keywords/Search Tags:Genome-wide association study, multi-locus model, nonparametric method, Effective mixed model association
PDF Full Text Request
Related items