Font Size: a A A

Variable Selection in Varying Multi-Index Coefficient Models with Applications to Gene-Environmental Interactions

Posted on:2018-10-31Degree:Ph.DType:Dissertation
University:Michigan State UniversityCandidate:Guan, ShunjieFull Text:PDF
GTID:1470390020455590Subject:Statistics
Abstract/Summary:
Variable selection is an important topic in modern statistics literature. And varying multi-index coefficient model(VMICM) is a promising tool to study the synergistic interaction effects between genes and multiple environmental exposures. In this dissertation, we proposed a variable selection approach for VMICM, we also generalized such approach to generalized and quantile regression settings. Their theoretical properties, simulation performance and application in genetic research were studied.;Complicated diseases have both environmental and genetic risk factors, and large amount of research have been devoted to identify gene-environment (GxE) interaction. Defined as different effect of a genotype on disease risk in persons with different environmental exposures (Ottman (1996)), we can view environmental exposures as the modulating factors in the effect of a gene. Based on this idea, we derived a three stage variable selection approach to estimate different effects of gene variables: varying, constant and zero which respectively correspond to nonlinear GxE effect, no GxE effect and no genetic effect. For multiple environmental exposure variables, we also select and estimate important environmental variables that contribute to the synergistic interaction effect. We theoretically evaluated the oracle property of the three step estimation method. We conducted simulation studies to further evaluate the finite sample performance of the method, considering both continuous and discrete predictors. Application to a real data set demonstrated the utility of the method.;In Chapter 3, we generalized such variable selection approach to binary response setting. Instead of minimizing penalized squared error loss, we chose to maximize penalized log-likelihood function. We also theoretically evaluated the oracle property of the proposed selection approach in binary response setting. We demonstrated the performance of the model via simulation. At last, we applied our model to a Type II diabetes data set.;Compared to conditional mean regression, conditional quantile regression could provide a more comprehensive understanding of the distribution of the response variable at different quantile. Even if the center of distribution is our only interest, median regression (special case of quantile regression) could offer a more robust estimator. Hence, we extended our three stage variable selection approach to a quantile regression setting in Chapter 4. We demonstrated the finite sample performance of the model via extensive simulation. And we applied our model to a birth weight data set.
Keywords/Search Tags:Variable selection, Model, Environmental, Varying, Data set, Interaction, Quantile regression, Simulation
Related items