Font Size: a A A

Penalized Generalized Estimating Equations For High-dimesional Longitudinal Data Analysis

Posted on:2022-09-19Degree:MasterType:Thesis
Country:ChinaCandidate:Y D WangFull Text:PDF
GTID:2480306533995999Subject:Mathematics
Abstract/Summary:PDF Full Text Request
Longitudinal data(cluster data or panel data)are to observe the same individual for many times.The data are related which are observed by different individuals are independent.Generalized linear model is a generalization of classical linear model.It can be used to analyze both continuous data and discrete data,especially discrete data(categorical data and count data)or non negative data.Generalized estimation equation(GEE)is a generalization of generalized linear model,and it is a generalized linear model with longitudinal data.A remarkable feature of GEE is that as long as the mean function is specified,even if the working correlation matrix(or covariance matrix)is misspecified,the regression parameter estimation still has consistency and asymptotic normality.If the variance is also specified,the estimated variance will be the smallest.In regression analysis,high dimension covariate is more and more common.Sometimes,although there are few variables,considering various cross factors,there are many covariates.Some of these variables are not related to or closely related to the response variables.If they are included in the model,the accuracy and efficiency of statistical inference will be affected.Therefore,it is very important to select the important covariates,that is,variable selection.This paper mainly studies the variable selection of high dimension GEE(the number of non-zero regression coefficients can also be divergent),that is,the progressive property of the penalty generalized estimation equation(PGEE).Under weaker conditions,it is proved that the model selection is consistent,and the Oracle properties of variable selection are obtained.Under the same conditions,the existence,consistency and asymptotic normality of regression parameter estimation of generalized estimation equation with covariable dimension are obtained.Therefore,the results of WANG,ZHOU,QU(Biometrics,68:353-360)and WANG(Ann.Statist.39:389-417)were improved.Finally,the writer changes the sample size.The accuracy and stability of simulation of different related structures are studied by random numerical experiments.At the same time,the simulation results of PGEE and GEE are compared in this paper.Specifically,the improvements of literature Biometric,68:353-360 are as follows:(1)Under the condition that the moment generating function of the response variable is uniformly bounded(the common model is satisfied),the square minus of the covariable dimension n not exceeding the sample size is any positive integer power of the covariable dimension.(2)If the moment generating function of the response variable is uniformly bounded,the r moment of the response variable will be uniformly bounded,then the covariate dimension can reach the r/2 power of the sample size.(3)reduces the covariate consistency bounded to its each element consistency bounded.(4)The characteristic root of the Fisher information matrix composed of all covariates is the order of n and the characteristic root of the information matrix composed of covariates corresponding to the non-zero regression coefficient is the order of n.As a result,the improvement of literature Ann.Statist.39:389-417 is to weaken the uniform boundedness of the first and second derivatives of the relation function near the parameter truth value to the uniform boundedness at the truth point.
Keywords/Search Tags:Longitudinal data, Penalized generalized estimating equations, High-dimensional covariates, Variable selection, Generalized linear model
PDF Full Text Request
Related items