Font Size: a A A

Gene-environment Interaction Analysis Based On Sparse Principal Component Varying-Coefficient Model

Posted on:2017-02-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:J SaFull Text:PDF
GTID:1220330503463231Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
Objective:Analysis of interactions between genes and environmental factors has become one of the central topics in genetics research. Aiming at these problems, the most basic method is using epidemiological concept to set up interactive terms, and then apply the classic statistical methods to find interactions and the effect size. However, G×E interactions cannot be described only by additive model or multiplicative mode.Moreover, it also shows non-linear relationship. This study considers two vital features, one is nonlinear relationship between environmental factors and genes, the other is the high-dimension features of SNPs and genes; Given these issues, we apply Sparse Principal Component Analysis and Varying Coefficient Model to assess non-linear influences of genes modified by environmental factors, to provide new tools for G×E identification for the genetic community.Methods:We take each gene as a unit, as genes are the function units in living organism, and apply the Sparse Principal Component Analysis(SPCA) on the SNPs of each gene,then establish nonlinear model on interaction effect between SPCs of gene and environmental factors. Through different models we can access different G×E effect:linear or nonlinear. Then we estimate the regression coefficients of each gene by applying the nonparametric B-spline estimation method and make hypothesis test.The real data in this paper are derived from GENEVA on birth weight. The birth weight of newborn children was affected not only by their own genes, but also the maternal uterine environment. These data contain 1,126 individuals and their whole genes and SNP, we regard the mother’s one hour oral glucose tolerance test(OGTT)as environmental factors,then combines it with the fetal genes to build model. We hypothesize that mother glucose level could(non)linearly modify fetal genes to affect fetal growth and weight. We obtain 12,005 genes after data cleaning. Applying PCA and SPCA to fit model for PC/SPC and environmental factor. We conduct simulations to confirm the performance of the model. We perform parameter estimation and hypothesis testing and simulation studies using R, and conduct sparse principal component analysis using “Elastic net” package.Results:1.Using the sparse principal component varying coefficients model to analyze fetal birth weight, regards sparse principal component for each gene(SPC) as a gene effect, mother’s OGTT glucose level as a environmental factors to fit model,eventually found two significant genes as shown in the Manhattan plot, which are located in chromosome 8 with gene symbol ANGPT1(containing 67 SNP, extracted seven main components, explained over 80% variance) and chromosome 20 with gene symbol NCOA5(containing 15 SNP extracted four main components,explained over 80% variance);2.At the same time, as a comparison, the principal components(PCs) of each gene also be regarded as the genetic effect to fit the model, the results of Manhattan plot shows that the significant genes are the same as that of which using the SPC method;3.We fit different models to assess the effect. Eventually we found not only the overall gene effect, but also the main effects of genes and the interactions between gene and environmental factors(maternal glucose) are significant. Among them, the main effects of gene ANGPT1(P=0.0003) is more significant than the interactive effect(P=0.002), while the genetic main effects of NCOA5(P=0.003) is weaker than its interaction(P=0.00015);4. Further test of significance for each SPC shows that four out of seven SPCs in gene ANGPT1 and three out of four SPCs in gene NCOA5 are significant. The non-zero loadings in each significant SPC indicate the relative importance of the corresponding SNPs that contribute to the gene effect;5.For the two genes ANGPT1 and NCOA5, the result shows that the fitted birth weight has slightly increasing trend as the maternal glucose level increasing, and this tendency is nonlinear, indicating there is nonlinear interaction between genes and environmental factors.6. Simulation studies showed that as the sample size increasing from 200 to 500 and 1000, the power be larger. As the error variance from 32s to2s,the power of total effect and interaction effect also increase. Assuming the same sample size,error variance and t,the power of model for the interaction is higher significantly than that of for the overall effect, assuring the performance of the model fot the interaction effect.Conclusions:In this study, we developed a varying coefficient model coupling with the sparse principal component analysis to study gene-based gene-environment interaction. The sparsity of the PC loadings can tell the relative importance of the corresponding SNPs in each gene by significance test. The model can also evaluate any potential nonlinear G×E interaction based on genes rather than on single SNPS. Thus our method is biologically more attractive and fits to the biological realm since genes are the function units in most living organisms. This study uses real data analysis and simulation to verify the feasibility and performance of the SPC-VC model in gene-environment interaction, it provides a novel and powerful tool for G × E study in complex diseases.
Keywords/Search Tags:Gene-environment Interaction, Sparse Principal Component, Varying Coefficient Model, Non-Linear relationship
PDF Full Text Request
Related items