Font Size: a A A

A Study On The Selection Of High - Dimensional Data Variables

Posted on:2014-10-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y YuFull Text:PDF
GTID:1100330434971325Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
Variable selection is an important issue in high-dimensional data analysis. Penalized likelihood methods are the most popular methods for variable selection. The idea of pe-nalizing for variable selection was proposed in1960’s, and statisticians began to work on the theoretical properties of the penalized likelihood estimators since1990’s, with focus on the linear and generalized linear models. However, there exists little breakthrough theoretical result for survival data model, which may be due to the censoring feature of the survival data. In chapter2, we study the oracle property of the penalized maxi-mum likelihood estimator in the context of Cox’s model-the most important model in survival analysis. We study Cox model with high-dimensional and time-dependent co-variates in the framework of counting processes. The regularity conditions for the main results are imposed directly on true regression coefficients, which are more natural and different from those for the existing methods.In practice, one is interested in developing computationally efficient algorithm to obtain penalized maximum likelihood estimators. Most existing algorithms are based on path algorithms. Finding penalized likelihood estimators is an optimization problem, which is relatively easy for convex penalties but is much more complicated for nonconvex penalties. We propose a new path algorithm that can handle both convex and nonconvex penalties. The major advantage of the new algorithm is that it can find global optimal solution more easily compared with the existing methods. Numerical studies demonstrate that this newly proposed algorithm is computational more efficient than the existing methods.Path algorithms provide a sequence of solutions, and it is necessary to find the optimal one. The traditional methods derived for low-dimensional data are not suitable for high-dimensional data, and most existing methods designed for high-dimensional data are to modify information-type criteria. In Chapter4and5, we modify a cross-validation algorithm for linear and generalized linear regression, respectively. These methods are shown to yield consistent estimators, and the validity of the methods are demonstrated via simulations and real applications.
Keywords/Search Tags:high-dimensional data, variable selection, penalized likelihood estima-tors, Cox model, path algorithm, cross-validation
PDF Full Text Request
Related items