| Missing data are often encountered in various fields,such as biomedicine,economics and clinical trials due to various reasons,for example,unwillingness of some respondents to answer sensitivity questions,loss of information caused by uncontrollable factors or drop out of the study.Ignoring the missing data may cause seriously biased estimators of parameters and inaccurate predictions,or lead to misleading conclusions.Great effort has been devoted to make inferences through moment models with missing data,however,these methods may have drawbacks including being subject to model misspecification,leading to inefficient estimation.Meanwhile,the presence of outliers or influential observations in the data set will possibly lead to distorted analysis,there is little work done on detecting the outliers or influential observations under moment models with missing data.In high-dimensional data analysis,a growing number of parameters and moment restrictions are increasingly employed,however,considerably less work is available in the literature for missing data.Therefore,it is of great theoretical and practical significance to investigate the statistical inference and diagnosis of moment models with missing data.This dissertation studies statistical inferences and diagnostics for a class of possibly over-identified nonsmooth moment models,as well as considers simultaneous variable selection and parameter estimation on growing dimensional moment models in the presence of missing data.Based on the augmented inverse probability weighted nonsmooth moment models via the estimated propensity score functions,we propose a semiparametric efficient generalized empirical likelihood approach for parameter estimation and hypothesis testing under parametric restrictions.Then we develop several diagnostic measures for quantifying the influence of individual observations,and construct a pseudo-residual-based stochastic process to assess the plausibility of the posited nonsmooth moment models.Also,we investigate the asymptotic properties of the diagnostic measures and goodness-of-fit test statistics.Finally,we consider simultaneous variable selection and parameter estimation with penalized empirical likelihood through certain growing dimensional moment models in the presence of missing data,and investigate the asymptotic properties of penalized empirical likelihood.Specifically,the main content of this dissertation is as follows.1.This dissertation investigates statistical inferences for a class of possibly over-identified nonsmooth moment models with missing data.Firstly,we assume an unknown propensity score function for missingness data mechanism,which avoids model misspecifi-cation,then we utilize the series estimation method to estimate unknown propensity score function.A set of augmented inverse probability weighted moment models are constructed as a basis for inferences using a semiparametric efficient generalized empirical likelihood method.Under some regularity conditions,we establish asymptotic properties of general-ized empirical likelihood estimators of parameters and generalized empirical likelihood ratio statistics for parametric restrictions enjoying the Wilks’phenomenon.Lastly,the theoretical properties and practical performance of our approach are demonstrated through numerical simulations and a real data analysis.2.This dissertation discusses statistical diagnostics when using generalized empirical likelihood with nonsmooth moment models with missing data.To identify outliers and influential observations,we construct several diagnostic measures including pseudo-residuals,Cook’s distance and generalized empirical likelihood displacement.Then we investigate the asymptotic properties of these diagnostic measures.Secondly,we construct goodness-of-fit statistics based on pseudo-residuals for assessing the plausibility of the posited moment models,then establish the asymptotic properties of test statistics under null and local alternative hypotheses.It is shown that goodness-of-fit test statistics can detect local alternatives close to null ones at the rate n1/2.In addition,a resampling approach is applied to obtain the approximation to the test statistics.Finally,simulation studies are conducted and a real dataset is analyzed to examine the finite-sample performance of the diagnostic measures and the goodness-of-fit statistics.3.This dissertation considers simultaneous variable selection and parameter esti-mation with growing dimensional moment models with missing data.First of all,we assume a sparse parametric logistic regression model for missingness data mechanism,then utilize the penalized likelihood method with some proper penalty function to estimate parameters in the propensity score model.A set of unbiased inverse probability weighted moment models are constructed based on penalized likelihood estimator.What’s more,we present the penalized empirical likelihood together with the L1/2penalty function for variable selection and parameter estimation on the basis of growing dimensional inverse probability weighted moment models,and establish the large-sample results under regularity conditions.Lastly,simulation studies and real data analysis are provided to evaluate the finite sample performance of the proposed method. |