Font Size: a A A

Variable Selection Of Some Regression Models In High Dimensionality

Posted on:2013-11-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:M Q WangFull Text:PDF
GTID:1220330395499240Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
Variable selection plays a vital role in dealing with the high dimensional data. An efficient variable selection method can yield a simplest model by removing the redundant covariates. In addition, a good method of variable selection can produce a better accuracy of prediction. Since the Lasso penalty is proposed by Tibshirani (1996), the variable selection based on the penalty function has received much attention. Much progress has been made in understanding the variable selection in high dimensional statistical models. Compared with the traditional approaches, the penalized methods have an unparalleled advantage in studying the high dimensional data. The reason is that the penalized meth-ods can simultaneously obtain variable selection and parameter estimation.This dissertation shows our results about variable selection in high dimensionality. It contains three parts. The first part studies the variable selection about high dimensional parameter models. Chapter2and3present some asymptotic properties in the gener-alized linear model and least squares approximation. Furthermore, the performances of parameter estimators and results of variable selection are evaluated by some simulation studies and real examples. In addition, we study variable selection based on the least absolute deviation in the settings of diverging number of parameters and ultrahigh di-mensionality in Chapter4. Under some regularity conditions, the penalized LAD-SCAD estimator enjoys the oracle property. Simulation studies and a real data are conducted to examine the theoretical results. In the second part, we study the variable selection in the high dimensional partially linear model. Chapter5applies the bridge penalty to select the parameters of linear part. Under some reasonable conditions, the bridge estimator of linear component has an oracle property. Moreover, the convergence rate of the estimator of the nonparametric part is optimal. Simulation studies and a data set show that the bridge estimator performs well. The third part studies the variable selection in current status data. Chapter6of this dissertation gives some results about the variable selection in the high dimensional continuous generalized linear model with current status data. We use the SCAD penalty to complete the feature selection and show the optimal conver-gence rate with diverging case. With proper choices of regularization parameters, the SCAD penalty can select the true model with probability tending to1. The estimators of significant variables have the same asymptotic distribution as the oracle estimators. In the end, some simulation studies and a real example illustrate that the finite sample performance of the estimator is quite well even when the data is censored.
Keywords/Search Tags:Variable Selection, Penalty Function, Generalized Linear Model, Par-tially Linear Model, Least Absolute Deviation Estimator, Least Squares Approximation, Oracle Property, Current Status Data
PDF Full Text Request
Related items