Font Size: a A A

Several Studies On Marginal Models And Mixed Effects Models For High-dimensional Longitudinal Data

Posted on:2014-02-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:P R XuFull Text:PDF
GTID:1220330398986400Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
This dissertation is devoted to methodology development for marginal models and mixed effects models in the literature of high-dimensional longitudinal data analysis.Longitudinal data typically refers to the data containing cross-sectional and time series observations at different time points for a number of individuals. It arises frequently from psychological studies, social sciences, medical studies and biological sciences. Its major character is to put cross-sectional data and time series data together, so that it is able to not only analyse the trend of individuals, but also analyse the total change trend. Recently, many statisticians pay attention to various models of longitudinal data. And in this dissertation, we focus on two types of models:marginal models and mixed effects models. We discuss parameter estimation and variable selection under these two types of models in the high-dimensional longitudinal data analysis.In Chapter2, for the logistic regression model, which is a special generalized linear model, we propose a two-stage shrinkage approach for simultaneous variable selection and parameter estimation. Its main idea is first to construct a weighted least-squares type function using a special weighting scheme on the non-conservative vector field of the generalized estimating equations model, and then to produce sparse estimation of the regression coefficients in the sprit of the adaptive Least Absolute Shrinkage and Selection Operator (Zou,2006). The proposed procedure enjoys the oracle properties in high-dimensional framework where the number of parameters grows to infinity with the number of clusters, i.e., with probability tending to1, we select the subset consisting of all the indices of nonzero coefficients and the estimators of the nonzero coefficients have the asymptotic normality property. Moreover, we prove the consistency of the sandwich formula of the covariance matrix even when the working correlation matrix is misspecified and develop a consistent penalized quadratic form function criterion for the selection of tuning parameter. Finally, we extend the technique to the general marginal longitudinal generalized linear models.With rapid development of computing power and other modern technology, high-throughput data sets of unprecedented size and complexity are often encountered in many statistical studies, such as gene expression data from DNA microarray experiments. How-ever, there are virtually no solutions for feature screening in the ultra-high dimensional longitudinal gene expression data setting. To fill in this gap. we propose a novel GEE- based screening procedure in Chapter3, which only pertains to the specifications of the first two marginal moments and a working correlation strueture. Different from exist-ing screening methods, the new method merely involves making a single evaluation of estimating functions instead of fitting all separate marginal models or computing each pairwise correlation. And we show that the proposed method is robust with respect to the mis-specification of correlation structures and enjoys the sure screening property.Motivated by an analysis of a real longitudinal data set from an epileptic seizure study, we suggest a marginal generalized single-index model in Chapter4. To well i-dentify the index in estimation, we first use the "remove-one-component" method for re-parametrization. Then, we suggest using a kernel GEE-type method to estimate the unknown link function and using a profile-type method to estimate the unknown index. We prove the estimator of the index is root-n consistent, and establish the asymptotic property of the nonparametric estimator of the generalized single-index function. A quasi-Fisher scoring type algorithm is also developed to estimate the unknown link function and the index iteratively.In Chapter5, we develop a double penalized hierarchical likelihood for selecting fixed and random effects in generalized linear mixed models simultaneously. The proposed method not only avoids the calculation of high-dimensional integral to define an objective function for effect selection, but also guarantee the positive defmiteness of the covariance matrix of selected random effects through Cholesky decomposition. We show that the resulting estimator enjoys the oracle properties with no requirement on the convexity of loss function. Moreover, a two-stage algorithm is proposed to effectively implement this approach. And an H-likelihood-based Bayesian information criterion is developed for tuning parameter selection.Moreover, we compare each proposal in this dissertation with its related alternatives by comprehensive simulation studies to illustrate the efficieney of our proposals. We also demonstrate the use of our proposals through a wide range of applications in real data analysis, such as the data from a crossover trial, a phase Ⅱ study of the anti-cancer inhibitor CCI-779, an epileptic seizure study, and a multi-center AIDS cohort study.
Keywords/Search Tags:Longitudinal Data, Generalized Estimating Equations, GeneralizedSingle-index Models, Generalized Linear Mixed Models, Variable Selection, Oracle Prop-erties, Sure Screening Property, Sparsity, Penalized likelihood
PDF Full Text Request
Related items