Font Size: a A A

Model Selection And Model Averaging For Several Classes Of Regression Models With Missing Data

Posted on:2021-04-29Degree:DoctorType:Dissertation
Country:ChinaCandidate:J ZengFull Text:PDF
GTID:1487306470970909Subject:Statistics
Abstract/Summary:PDF Full Text Request
Statistics is a discipline that collects data,analyzes data,and interprets data After practical workers have obtained a set of data,they can use statistical tools to fit many models,but how to find the most suitable model has always been a hot research topic in statistics.Too complex models may cause the variance of estimators or predicted values to be too large,while overly simple models may cause the bias of estimators or predicted values to be too large.To solve this prob-lem,in the past few decades,scholars have proposed a variety of model selection criteria and methods,such as Akaike's information criterion(AIC),Bayesian in-formation criterion(BIC),focused information criterion(FIC),Mallows,Cp,Cross Validation,least absolute shrinkage and selection operator(LASSO)and smoothly clipped absolute deviation(SCAD),etc.According to these criteria or methods,the best model can be chosen from a large number of candidate models.Then the selected model is regarded as the real data generating process,and the subsequent statistical inference depends entirely on the chosen modelAlthough model selection methods solve the above problems to a certain ex-tent,these methods also have obvious shortcomings.For example,the robustness is dissatisfactory,the uncertainty generated in the model selection stage is ignored,useful information may be lost,and high risks are existed in statistical inference To avoid these shortcomings,an effective way is to use the model averaging method that combines many models.Unlike the model selection method,which only picks a single best model,the model averaging method combines estimators or predict-ed values from a lot of candidate models.The model averaging method not only incorporates the uncertainty introduced by model selection,but also avoids the potential risks of selecting a single model.Thereby,the mean squared error of estimator or predicted value can be reduced,and the robustness can be improved With the rapid development of the research in model averaging method,a lot of research achievements have obtained in recent years.One of the important re-search directions is the Frequentist model averaging(FMA)method,which mainly focuses on two problems:one is to select the optimal weight of the model aver-age estimator,and the other is to determine the asymptotic distribution of the model average estimator.From the perspective of estimation or prediction,model selection can be treated as a special case of model averaging.However,model selection method should not be completely replaced by model averaging method,these two approaches are complementary.For example,many scholars proposed to perform some model selection procedures first,and then perform model averaging procedure based on the selected modelsMissing data is an important type of complex data in modern statistical prac-tice.In recent years,exploring the statistical analysis methods for missing data has become a hot topic in statistical research.In this dissertation,we discuss model selection and model averaging in several classes of regression models(par-tially linear models,varying-coeffcient partially linear models,and linear quantile regression models)with missing data,based on imputation strategy or inverse probability weighted method.We also derive the model selection criterion and the model average estimator's asymptotic distribution in concrete models.Specifically,the research contents of this dissertation contain the following four aspects(1)We study model selection and model averaging for semiparametric partial-ly linear models when the response variable is assumed to be missing at random.In each candidate model,the estimator of the parameter and its asymptotic property are obtained based on imputation method and weight function method.Then,a FIC and a FMA estimator are deduced,the model average estimator's asymptotic distribution is derived,and an appropriate confidence interval is constructed for the focus parameter.A simulation study examines the finite sample performance of the proposed method(2)In the context of the varying-coefficient partially linear models with miss-ing responses,we study the FIC and the corresponding smoothed FIC(S-FIC)model average estimator,based on the imputation method and the profile least-squares technique.Under a local misspecification framework,we demonstrate the asymptotic normality of the focus parameter's estimator in each candidate model.Then we develop the FIC to conduct model selection and construct the weight function of the S-FIC estimator.Finally,we derive the asymptotic property of the FMA estimator.A simulation study and a real data analysis show that the proposed method performs well.(3)By the inverse probability weighted approach which is based on the co-variate balancing propensity score method,we obtain a FIC and a FMA estimator in the varying-coefficient partially linear models with missing responses.Under a local misspecification framework,we examine theoretical properties of the FIC and FMA.The simulation studies not only show the robustness of the inverse probability weighted approach based on the covariate balancing propensity score method,but also demonstrate the superiority of the proposed model averaging method.(4)We consider the model averaging procedure for the linear quantile regres-sion models when the covariates are missing at random.Firstly,we define the weighted quantile regression estimators in candidate models.Secondly,we prove that the estimators of the parameter and the parameter's function are asymp-totically normal.Thirdly,we develop the asymptotic distribution of the model average estimator.Finally,based on the model average estimator,we construct a confidence interval with an actual coverage probability that tends toward the nominal level.A simulation study illustrates that,in terms of mean squared error and coverage probability,the proposed model average estimator has an edge over the corresponding model selection estimator.
Keywords/Search Tags:Missing data, Regression models, Model selection, Model averaging
PDF Full Text Request
Related items