| Genome-wide association study(GWAS) has been recognized as a robust tool for discovering the importance of genetic factors in complex traits. By the end of October 2014, about 19,602 single nucleotide polymorphisms(SNPs), which are associated with 1,251 traits, had been detected. But for anyone trait, only minority SNPs can pass the multi-level validation of GWAS and explain small proportion of heritability. Many researches suggest that the validated SNPs from GWAS perform low power in genetic risk predictions of some traits, and neglecting a mass of low effect SNPs has been regard as a main reason for that. So, how to make use of the information in GWAS has been the key of success. Recently, two excellent strategies had been raised: one is specifying a loose hypothesis test level, and another is prediction with all SNPs by a linear mixed model(LMM). Base on the two strategies, we propose two methods: s GRS and s GRS-LMM and assess the performance of them in genetic risk prediction of complex traits.There are two purposes in this research: firstly, the prediction accuracy of s GRS and s GRS-LMM will be compared with that of other methods; secondly, some underlying factors affecting the prediction accuracy will be discussed.In this study, some simulation trials are used for comparing the prediction performance of BLUP, AM-BLUP, w GRS, RF, s GRS and s GRS-LMM, and then we apply these methods to a real GWAS data of non-small cell lung cancer(NSCLC) in Han Chinese population. The main contents of this study read as follow:1. Simulations based on Chromosome 1: In the simulations, we use the genotype of Chromosome 1 in the real GWAS data and the quantitative phenotype and binary phenotype are generated with simulations by setting some different parameters: sample size, heritability, number of risk loci and population prevalence. Then the six methods will be applied to the simulation data. 2. The real data analysis: We apply the six methods to a real GWAS data of NSCLC in Han Chinese population. As a train data, Nanjing Population are used to build prediction models of the six methods; As a test data, Beijing Population are used for evaluating the prediction accuracy of the methods above.The main results of this study are as follow:1. Results of simulation trials: In most simulation conditions, the prediction accuracy of s GRS and s GRS-LMM are better than the others; Sample size, heritability, number of risk loci and population prevalence all have impact on the prediction accuracy of the six methods; Quantitative and binary phenotype in the six methods have similar trends.2. Results of the real data analysis: The prediction accuracy of s GRS and s GRS-LMM are better than the others, and the value(AUC=0.735) of s GRS-LMM is highest in all methods. There is large gap between the value of s GRS-LMM and the theoretical prediction accuracy.Conclusion: All Results of simulation trials and real data analysis suggest that sGRS and s GRS-LMM are effective in genetic risk prediction of GWAS data. |