Font Size: a A A

High-dimensional regression with grouped variables

Posted on:2010-08-23Degree:Ph.DType:Dissertation
University:The University of IowaCandidate:Wei, FengrongFull Text:PDF
GTID:1440390002979449Subject:Biology
Abstract/Summary:
In many multiple regression problems when covariates can be naturally grouped, it is important to take into account the group structure and select groups of variables. Such kind of problems arise in many statistical modeling and applied problems. For example, in multifactor analysis-of-variance problems, each factor may have several levels and can be expressed through a group of dummy variables. Then the selection of important factors corresponds to the selection of groups of variables.;There has been much work on the selection of important groups of variables using penalized methods. In our study, we generalize the results on the Lasso obtained in Zhang and Huang (2008) to the group Lasso in high-dimensional cases. We study the selection and estimation properties of the group Lasso and adaptive group Lasso methods. We show that, under appropriate conditions, the group Lasso selects a model of the right order of dimensionality and controls the bias of the selected model at a level determined by the contributions of small regression coefficients and threshold bias. In addition, we show that, under a narrow sense of sparsity condition, the adaptive group Lasso possesses an oracle selection property, in the sense that it can correctly select important groups with probability converging to one. In contrast, group Lasso does not posses this oracle property.;Moreover, we apply the idea of the group Lasso to nonparametric varying coefficient problems which can simultaneously select the important variables and estimate the relative coefficient functions. We approximate each coefficient function by B-spline basis functions. Thus, the selection of important variables and the estimation of the corresponding coefficient functions amounts to the selection of groups of variables and the estimation of the relative spline approximation coefficients. We show that, under appropriate conditions, the estimator has consistency in sparsity and converges at the best possible rates.;Existing algorithms are adapted to compute the solution paths for both group Lasso and adaptive group Lasso. Tuning parameter selection and initial value selection methods are considered during the implementation of the algorithms. All the methods are illustrated by simulation studies and real examples.
Keywords/Search Tags:Variables, Regression, Important, Adaptive group lasso, Selection, Methods
Related items