Font Size: a A A

Quantile Regression For Nonignorable Missing Data

Posted on:2024-05-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:A A YuFull Text:PDF
GTID:1520307307494834Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
In the past decade,there has been growing enthusiasm for using electronic medical records(EMR)for biomedical research.When analyzing EMR data,a major challenge is dealing with a large amount of missing data in EMR.Most of the statistical methods in the existing literature for handling missing data are based on the missing at random assumption,which hardly holds in EMR data.In an EMR system,the timing of a patient’s hospital visit,as well as which particular variables are recorded at his/her visit,is dependent on the patient’s health conditions,which makes the missing probability of data dependent on the missing data itself even if given the observed data,that is,the missing mechanism is nonignorable.Under this missing mechanism,there is a problem of identifiability when both the original data generation model and the missing mechanism model were fully unspecified or nonparametric.In fact,even when both models are assumed to be in parametric form,the parameters may still be non-identifiable.At present,handling nonignorable missing data is still a thorny problem.On the other hand,most human disease progressions are complex,and their associations with some influencing factors are highly heterogeneous in the population.Therefore,another major challenge of EMR analysis is to identify and incorporate data heterogeneity and association complexity.Quantile regression can describe the heterogeneous effect of covariates on the entire conditional distribution of response variables and their local correlations,and thus could efficiently account for the effects of unobserved data heterogeneity.So compared with mean regression,it can provide a unique insight into the complexity and heterogeneity of EMR data.Furthermore,quantile regression specifies no parametric model assumptions on the distribution of random errors,and its sub-gradient of the corresponding loss function is bounded which enjoys robustness against heavy tails and outliers of responses,another desired property for analyzing error-prone EMR data.Hence,in this dissertation,we mainly focus on the estimation and inference of quantile regression parameters based on nonignorable missing data.The research work in this dissertation mainly includes the following:Firstly,we consider cross-section data with nonignorable nonresponse.In Chapter II,with quantile regression and parametric missing mechanism models,we construct new joint unbiased estimating equations to estimate unknown parameters.All the methods in the existing literature have ignored the implicit but deterministic relationship between the conditional quantile function and the conditional probability density of the response,which lead to unnecessary parametric assumptions and consequently increase the risk of biased estimation of quantile regression parameters.In order to overcome this issue,we use the conditional distribution information of the response variable Y implied in the quantile regression model to interpolate the missing response variable,which also improves the efficiency of parameter estimation.We also provide an efficient iterative algorithm to obtain the estimates.We theoretically and systematically study the asymptotic properties of the proposed estimator.In addition,we further propose a resampling approach for the statistical inference of the proposed estimator,and theoretically validate this approach.Secondly,we consider longitudinal data with non-monotone nonignorable nonresponse in Chapter Ⅲ.We propose corresponding estimation methods based on the assumption of missing mechanism parametric model and semi-parametric model,respectively.When the missing mechanism model is parametric,we extend the method proposed in Chapter II to analyse longitudinal data.In order to overcome the large variations of the inverse probability weights when the missing rate is high or the missing probability is highly dependent on some variables,we propose to use the stabilized inverse propensity score weighting to further improve the performance of the method.When considering the semi-parametric model for the missing mechanism,we first use the kernel density estimation method and the two-step generalized moment method to estimate the nonparametric function and tilting parameter in the missing mechanism model,and then obtain the quantile regression parameter estimation based on the inverse probability weighting method and the stabilized inverse propensity score weighting method.We have also theoretically validated the consistency and asymptotic normality of the estimators proposed under the parametric or semi-parametric model assumptions,respectively.Lastly,since the variables of each subject are repeatedly measured at multiple time points in a longitudinal study,the longitudinal data contains the within-subject correlation information.Therefore,in the Chapter IV of this paper,based on the proposed methods in Chapter III,we put forward a method combining the quadratic inference functions approach and empirical likelihood approach to incorporate the within-subject correlation information,which improves the efficiency of parameter estimation.We theoretically study the asymptotic properties of the proposed estimators.For the proposed methods in this dissertation,we also verify their effectiveness through simulation studies and real data examples,and accordingly obtain interpretable conclusions.
Keywords/Search Tags:nonignorable missing, quantile regression, estimating equations, Monte Carlo integration, inverse probability weighting, stabilized inverse propensity score weighting, generalized method of moments, within-subject correlation
PDF Full Text Request
Related items