Font Size: a A A

Statistical Inference For Semiparametric Regression Models With Longitudinal Data

Posted on:2015-09-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:R Q TianFull Text:PDF
GTID:1220330452953429Subject:Statistics
Abstract/Summary:PDF Full Text Request
In this thesis, we are mainly interested in the analysis of the efcient empiricallikelihood inference and variable selection for semiparametric regression modelswith a class of longitudinal data, including high-dimensional longitudinal dataand measurement error longitudinal data.Longitudinal data is typically referred to the data in which individuals aremeasured repeatedly at diferent time. The major character of longitudinal datais that longitudinal data are usually correlated within a subject and independentbetween subjects. It is well known that ignoring the correlation among diferentobservations from the same individual could lead to inefcient estimation. Hence,the challenge for longitudinal data analysis is how to account for the correlationwithin subjects. Recently, many statisticians pay attention to analyze longitudinaldata. The semiparametric regression models include parametric and nonparamet-ric components. So, these semiparametric regression models combine many advan-tages of parametric models and nonparametric models. They can fully utilize theinformation of the data, which have more implements and stronger explanations.Hence, it has theoretical and practical significance to study the semiparametricregression models with longitudinal data.Based on the methods of quadratic inference functions, generalized estimatingequations and so on, this thesis studies a class of semiparametric regression modelswith longitudinal data, including the generalized linear models, the partially linearmodels and the varying coefcient partially linear models. More specifically, the research contents of this thesis are summarized as follows:For the generalized linear models, we propose a generalized empirical likeli-hood method by combing generalized estimating equations and quadratic inferencefunction based on the working correlation matrix. This method can handle thecorrelation within subjects without involving direct estimation of nuisance param-eters in the correlation matrix. Under some suitable conditions, the generalizedempirical log-likelihood ratios are proven to be asymptotically chi-squared. Inaddition, this method is extended to study the partially linear models. The gener-alized empirical likelihood ratio and a empirical likelihood ratio for the regressioncoefcients and the baseline function are constructed. A nonparametric versionof Wilk’s theorem for the limiting distribution of the empirical likelihood ratio isderived. Simulation results show that the proposed methodology performs well.For the varying coefcient partially linear model with longitudinal data, wepropose a variable selection procedure by combining basis function approxima-tions with penalized quadratic inference functions. Our variable selection proce-dure can select the significant variables in the parametric components and thenonparametric components simultaneously. The proposed nonparametric estima-tor can obtain the optimal rate of convergence. With appropriate selection ofthe tuning parameters, we establish the consistency and asymptotic normality ofthe resulting estimators. In addition, an algorithm is proposed to implement theproposed estimators. We also discuss that how to select the tuning parameter λ.Moreover, we provide the asymptotic property of the BIC-type tuning parameterselector. Extensive Monte Carlo simulation studies and a real data applicationare conducted to examine the finite sample performance of the proposed variable selection procedure.For the errors-in-variables regression model with longitudinal data, firstly,we focus on the variable selection for the linear EV model. A new bias-correctedvariable selection procedure is proposed based on the combination of the quadraticinference functions and shrinkage estimations. With appropriate selection of thetuning parameters, we show that this variable selection procedure is consistent,and the estimators of regression coefcients have Oracle property. Secondly, weextend this bias-corrected variable selection procedures to semiparametric errors-in-variables regression model by using kernel smoother to obtain the estimator ofnonparametric function. The bias-corrected variable selection procedure enablesus to attain estimation efciency and consider the correlation within subjectsfor longitudinal data which does not involve correlation parameter estimation.Under some regularity conditions, we show that this variable selection procedurecan identify the true model consistently, and the penalized quadratic inferencefunctions estimators have Oracle property. Some simulated examples show thatour proposed methods work well.Finally, we propose an automatic variable selection for high-dimensional par-tially linear models with longitudinal data, say smooth-threshold generalized esti-mating equation. The proposed procedure automatically eliminates inactive pre-dictors by setting the corresponding parameters to be zero, and simultaneouslyestimates the nonzero regression coefcients. This procedure can handle the cor-relation within subject by using generalized estimating equation. Compared tothe shrinkage methods, our approach can be easily implemented without solvingany convex optimization problems. In addition, under some regular conditions, we establish the asymptotic properties in a high-dimensional framework where thenumber of covariates p increases as the number of clusters n increase. Simulationstudies are conducted to examine the finite sample performance of the proposedvariable selection procedure. As we know, the generalized estimating equationmethod enables one to estimate the regression parameter consistently even theworking correlation is misspecified. However, under such misspecification, the es-timator of regression parameter will be inefcient. The quadratic inference func-tion approach has been proposed as an improvement to generalized estimatingequation. Hence, we propose a new automatic variable selection procedure us-ing the smooth-threshold estimating equation and quadratic inference functionfor high-dimensional linear models with longitudinal data. Extensive Monte Carlosimulation studies are conducted to examine the finite sample performance of theproposed variable selection procedure.
Keywords/Search Tags:Longitudinal data, Empirical likelihood, Quadratic inferencefunctions, Generalized estimating equation, Variable selection
PDF Full Text Request
Related items