Font Size: a A A

Feature Screening And Model Parameter Estimation For Missing Response

Posted on:2020-01-18Degree:DoctorType:Dissertation
Country:ChinaCandidate:X X LiFull Text:PDF
GTID:1480306005490894Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
With the rapid development of data collection technology,many researchers can obtain ultra-high dimensional data at low cost in many fields.In ultra-high dimensional data analysis,It is well known that the number of prediction variables p increases exponentially with the increase of sample size n,but only a few of prediction variables have a significant impact on the response variables.To this end,statisticians have proposed a number of marginal feature screening methods.But in actual application,due to various reasons,missing data often appear in economics,sociology,biomedicine,market research and many other fields,In recent years,statistical inference under missing data has attracted much attention.Classical statistical methods and theories are established on the basis of the complete data analysis,and cannot be directly applied to the missing data of statistical inference.The most commonly used method in missing data analysis has complete case analysis,impution method,the likelihood method,and inverse probability weighting method,etc.how to reduce the dimensionality of prediction variables to a medium scale in the context of missing data? and then model and estimate parameters are important problems to be solved.To this end,under the framework of missing data,this dissertation studies feature screening under ultra-high dimensional data,parameter estimation of generalized partial linear single-indicator model and robust estimation when abnormal points exist in data.The main content of this dissertation is as follows:1.This chapter addresses the feature screening issue for ultrahigh-dimensional data with response missing at random.A novel nonparametric feature screening procedure is developed to identify the important features via the conditionally imputing marginal Spearman rank correlation.The proposed nonparametric screening approach doesn't have to assume any regression form of predictors on response variable and specify a parameterized model for the missing data mechanism model.It is robust to outliers and heavy-tailed data.Under some regularity conditions,it is shown that the proposed feature screening procedure has the sure screening and ranking consistency properties.Simulation studies show that the proposed screening procedure outperforms several existing model-free screening procedures.An example taken from the microarray diffuse large-B-cell lymphoma study is used to illustrate the proposed methodologies.2.This chapter studies parameter estimation for generalized partially linear single-index models when response variable is nonignorable missing.We consider the semi-parametric logistic regression model as response mechanism in this dissertation.Combining the local likelihood technique and the propensity score method,a profile weighted estimating equations(WEE)-based approach is proposed to estimate the parameters and non-parameters in generalized partially linear single-index models.Basing on profile principle,using a kernel-type estimator to estimate the nonparametric component,and then estimate the parameter of response mechanism by the generalized method of moments.Asymptotic properties of the proposed estimators are established.Simulation studies and real data applications are conducted to illustrate the effectiveness and feasibility of the estimators.3.This chapter studies the parameter estimation of regression models with simultaneous missing response and outliers in both response and covariates.First,to be robust against outliers in both response and covariates,we define a weighted profile likelihood whose purpose is to obtain robust parameter estimates in the missing data mechanism model.Second,using the first derivative of the Tukey's biweight function,based on the inverse-probability weighted method and redescending technique,we construct a class of unbaised estimating equations containing the parameter involved in the regression model.Ihese new estimating equations not only deal with missing data but also eliminate the effect of outliers.Finally,we employ the generalized method of moments to estimate unknown parameters of interest,and then investigate the large sample theory of the proposed estimator.Expensive simulations and a realdata example have shown that the proposed method performs well when there exists missing response and outliers in both response and covariates.
Keywords/Search Tags:Feature screening, Missing data, Marginal Spearman rank correlation, Parameter estimation, Robust estimation
PDF Full Text Request
Related items