Font Size: a A A

Variable Selection Of High Dimensional Models With Longitudinal Data

Posted on:2022-12-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:X L GaoFull Text:PDF
GTID:1527306602485504Subject:Statistics
Abstract/Summary:PDF Full Text Request
With the development of experimental techniques,data acquisition becomes easier,which makes the amount of data huge but the structure complex.Big data contains a large amount of information,but the complex structure requires more advanced statistical methods.In particular,the high-dimensional longitudinal data contains large amount of information and meanwhile the structure of high-dimensional longitudinal data is complex.Furthermore,censored data,heavy-tailed data,group structure or contamination often occur in high-dimensional longitudinal data,which bring extra difficulties.Therefore,it is of great theoretical and practical significance to further develop statistical methods for high-dimensional longitudinal data to extract information effectively.The character of high-dimensional longitudinal data bring difficulties theorem derivation and estimator calculation.Furthermore,the case that when the censored data,heavy-tail data,contamination and group structure occur in high-dimensional longitudinal data is more difficult.Specifically,the difficulties brought by the characteristics of high-dimensional longitudinal data include but are not limited to:the dimension of high-dimensional longitudinal data is higher than the sample size,such that the methods in low dimensional,e.g.least squares,maximum likelihood,Quasi Likelihood,pseudo likelihood and estimation equation,et al.,are no longer applicable.Traditional methods need to be improved to solve the problem that the dimension is larger than the sample size,which brings difficulties in calculation and derivation of large sample properties.Moreover,due to the correlation within individuals in high-dimensional longitudinal data,the method in independent data is not suitable,thus it is necessary to introduce the methods which use the information of the intra-subjects correlations to improve the estimation efficiency.In the case that the censored data,heavy-tailed data,contamination and group structure occur in high-dimensional longitudinal data,the extra difficulties include but are not limited to:the origin of quantile model loss function is singular,thus how to avoid the difficulty of derivation at the original point when deriving the properties of large samples;the nonparametric terms need spline approximation or local polynomial estimation,but it is not equivalent to completely changing the nonparametric model into a parametric model,thus it is difficult to prove the consistency of parameters and the nonparametric parts;bias correction is needed in high dimensional longitudinal right censored data,but how to propose weights which can correct bias without losing the information of intra-subjects correlation is difficult;robust estimation is needed in heavy-tailed data or contaminated data,how to propose estimator which achieves double robustness is difficult;it is necessary to consistently select variables between groups and within groups at the same time when group structure occur in covariates,but how to prove the bi-level variable selection consistency is difficult,et al.Due to the intra subjects correlations in high-dimensional longitudinal data,the method without considering intra subjects correlations is not suitable,thus it is necessary to introduce methods which consider the intra subjects correlations improve efficiency.It should be pointed out that when one or more of the cases such as censoring,heavy-tailed distributions,contaminations and group structure appear in high-dimensional longitudinal data model,to propose estimation method,to give the derivation of large sample properties,to propose algorithm for calculating the estimator and to design computer programs are not equivalent to each other in different combinations of cases.In each case,it is difficult to give the estimation method,deduce the properties of large samples,and design the algorithm and program.In view of the above problems,in general,this paper presents variable selection methods in high-dimensional longitudinal data models with heavy-tailed errors and contaminations in covariate in the ultra-high-dimensional longitudinal data,with right censored response and group structures respectively.The large sample properties of the estimators are obtained,and the corresponding algorithms for calculating the estimators is given.The superiority of the proposed methods are further verified by data simulations and empirical analysis.Specifically,aiming at the problem of heavy tailed distribution and contaminated covariates in ultra-high dimensional longitudinal data,this paper studies quantile regression model of ultra-high dimensional longitudinal data,and gives a weighted adaptive lasso(War lasso)methods to solve the problems such as that quantile loss function is singular at the origin,covariates contaminations,intra-subjects correlations occur simultaneously which cannot be solved by existing methods.The WAR-Lasso method is double robust,and can consistently select and estimate significant variables.Aiming at the problems of right censoring in ultra-high dimensional longitudinal data and group structure in the model,this paper studies the group structure accelerated failure model(AFT model)with ultra-high dimensional longitudinal data,and proposes a quadratic inference function adaptive bridge(QA-gbridge)method,which resolves that the existing methods can not adjust the weight of AFT model with ultra-high dimensional longitudinal data,possess the bi-level selection consistency and do not lose the information of intra-subjects correlations at the same time.The efficiency of estimation is improved and the corresponding algorithm is fast.QA-gbridge method possesses bi-level variable selection consistency,estimation consistency and asymptotic normality.Aiming at the problems of right censoring in ultra-high dimensional longitudinal data,group structure and nonparametric terms exist in the model,a group structure partially linear AFT model with ultra-high dimensional longitudinal data is studied,and a group bridge smooth threshold weighted generalized estimation equation(LDGBW-SGEE)method for longitudinal data is proposed.LDGBW-SGEE method can solve the problems of right censoring and nonparametric temrs exist in the model at the same time while remain the bi-level variable selectionconsistency,but other methods can’t.LDGBW-SGEE method possesses the properties of bi level selection consistency of parameter part,consistency of parameter estimation,consistency of nonparametric part estimation,and asymptotic normality.Aiming at the problems of right censoring,heavy-tailed response variables and group structure exist in ultra-high dimensional longitudinal data,this paper studies the group structure quantile AFT model with ultra-high dimensional longitudinal data,and proposes an adaptive group bridge penalty quantile quadratic inference function(QA Quan Gbridge)method,which can solve the problem that right censoring,heavy-tailed response variables and group structure exist in ultra-high dimensional longitudinal data at the same time,while other methods can’t.QA-quan-gbridge method possesses bi-level variable selection consistency,parameter estimation consistency and asymptotic normality.Aiming at the problems of right censoring,heavy-tailed response variables,group structure and nonparametric terms exist in the model with ultra-high dimensional longitudinal data,the quantile partial linear AFT model with ultra-high dimensional longitudinal data is studied.A bridge smooth threshold weighted generalized estimation equation(QLDGBWSGEE)method is proposed,which can solve the problem that right censoring,heavy-tailed response variables,group structure and nonparametric terms exist in the model at the same time,while other methods can’t.QLDGBW-SGEE method possesses the properties of bilevel variable selection consistency,consistency of parameter part and nonparameter part estimation,and asymptotic normality.
Keywords/Search Tags:ultra-high-dimensional longitudinal data, right censored data, variable se-lection, double robust estimation, quantile regression models
PDF Full Text Request
Related items