Font Size: a A A

Missing covariates and high-dimensional variable selection in additive hazards regression

Posted on:2012-08-07Degree:Ph.DType:Dissertation
University:University of Southern CaliforniaCandidate:Lin, WeiFull Text:PDF
GTID:1450390011956945Subject:Biology
Abstract/Summary:
This dissertation addresses two challenging problems arising in inference with censored failure time data. The additive hazards model provides a unified framework for these problems to be discussed, not only because it is a useful alternative to the well-known Cox model and has significant practical implications, but also because its simple yet elegant structure allows one to explore some fundamental aspects of these problems.;In the first part of this dissertation, we consider the estimation problem in additive hazards regression with missing covariates. We are interested in both the case where the observation probabilities are known and the case where they are unknown but can be parametrically modeled and estimated. By modifying the pseudoscore function with full data, we introduce some weighted estimators for the regression coefficients and the cumulative baseline hazard function. The proposed estimators are then shown to be consistent and asymptotically normal under mild conditions, with asymptotic variances that can be easily estimated. Our theoretical results and simulation studies indicate that using estimated weights in the simple weighted estimators may yield important efficiency gain and that the augmented weighted estimators are even more efficient. The proposed methods are further illustrated by a mouse leukemia data example.;In the second part, we turn to the variable selection problem in the additive hazards model. Motivated by linking high-throughput genomic data to survival outcomes, we are particularly interested in the high-dimensional setting where the dimension of covariates may grow fast, possibly nonpolynomially, with the sample size. We propose to perform variable selection and estimation simultaneously by using a class of regularized estimators with a general family of concave penalties, including several popular choices such as the lasso, SCAD, MCP, and SICA. In a nonasymptotic framework where the model dimensions are allowed to vary freely, we rigorously investigate the weak oracle properties and oracle properties of the proposed estimators. Our theoretical results are essentially different from those in the existing literature, and provide new insight into the model selection properties of regularized estimators for survival models. We illustrate the proposed method by simulation studies and application to a diffuse large B-cell lymphoma data set.;A common theme underlying the theoretical development in this dissertation is the use of modern empirical process theory. Indeed, we rely on the language of empirical process theory to establish our theoretical results for both problems considered here, and they serve as excellent examples for demonstration of the power and elegance of this mathematical tool, especially in the context of survival analysis.
Keywords/Search Tags:Additive hazards, Variable selection, Data, Model, Covariates
Related items