Font Size: a A A

Model Selection And Parameter Estimation With Incomplete Longitudinal Data

Posted on:2020-04-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:W J WangFull Text:PDF
GTID:1480306005990889Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
With the rapid development of technology,longitudinal data widely appear in econometrics,finance,medicine,chemistry,biology and other related fields.Longitudinal data refer to data observed repeatedly by the same group of individuals or experimental units at different time points.Generally,it is assumed that the observed data of different individuals are independent,and that the observed data of the same individual are correlated.However,in practical applications,especially the rise of ”big data” technology,there are some missing longitudinal data observed or recorded,such as single-cell RNA data,national population census,therapeutic effect of new drugs,etc.Therefore,it is of practical significance to study the selection and estimation of models with missing longitudinal data.The two core issues discussed in this dissertation are the correlation within the vertical data group and the processing of missing data.In this dissertation,the related processing methods in longitudinal data sets include generalized estimation equation(GEE),quadratic inference function(QIF),and nonparametric moving block.The processing methods of missing data include direct data filling through matrix decomposition,traditional inverse probability weighting,partial interpolation estimation equation and augmented inverse probability weighting.The missing vertical data types studied in this dissertation include the missing of low-dimensional data,high-dimensional data and ultra-high-dimensional data.The model includes generalized linear model,generalized partial linear model and partial linear varying coefficient EV-model.At the same time,we also study the problem of variable selection for ultra-high dimensional missing longitudinal data without model assumption.The dissertation mainly does the following aspects of work:1.Based on the generalized linear model,the parameter estimation of model for longitudinal data with mixed structure is studied under random missing sample data of covariate.Firstly,we interpolate missing data using weighted robust non-negative matrix factorization(WNMFP)method.Then,we construct an estimation equation based on generalized linear model to estimate the estimated parameters by using pseudo-likelihood and generalized estimation equation.Research proves the convergence of weighted robust nonnegative matrix factorization(WNMFP)method,the consistency and asymptotic normality of parameter estimators after interpolation of complete data.Simultaneously,the simulation results show that our interpolation method is more effective and robust,when the data missing rate is high.2.Based on the generalized partial linear model under random missing sample data of response variables,the variable selection and estimation for high-dimensional missing longitudinal data are studied,a method of variable selection for quadratic inference in smooth domain is proposed.This method automatically reduces the coefficients of insignificant variables to 0,and obtains the estimation of the coefficients of important variables.It avoids discuss the convex optimization penalty function in the traditional penalty method,and is easy to calculate.Simultaneously,the quadratic inference function is used to consider the correlation within the longitudinal data set.Research has proved that,the consistency and asymptotic normality of model variables selection in the framework of large n and divergent p under appropriate regularization conditions.Random simulation results show that the proposed method has better finite sample properties.3.Based on the model-free assumption,the problem of feature selection for ultra-high dimensional longitudinal data is studied under random missing sample data of response variables.A non-parametric feature selection method based on partial interpolation of ”local” information flow is proposed.The method considers both missing data and predicted variables about the symmetry of response variables.It is shown that under certain regular conditions,in the framework of ”big p and small n”,the number of variables p increases exponentially with the increase of sample size n.The proposed method has ”deterministic screening property”.The simulation results show that the proposed method can select active variables effectively.4.Based on the partial linear varying coefficient EV-model,the moving blockempirical likelihood inference of longitudinal data is studied under random missing of sample data of response variables,and an inverse probability weighted moving block empirical likelihood estimator is proposed.This method uses the non-parametric idea of moving blocks to deal with the correlation within the longitudinal data group,which has certain stability.The results show that the proposed empirical likelihood ratio statistic of moving blocks converges to the chi-square distribution according to the distribution under appropriate regularization conditions,so the interval estimation of the parameters to be estimated can be constructed effectively.The simulation results also show that the proposed method has good finite sample properties.
Keywords/Search Tags:Longitudinal data, Missing data, Generalized estimation equation, Moving Block empirical likelihood, Quadratic inference function, Variable selection, feature screening
PDF Full Text Request
Related items