Missing covariates and high-dimensional variable selection in additive hazards regression

Posted on:2012-08-07

Degree:Ph.D

Type:Dissertation

University:University of Southern California

Candidate:Lin, Wei

Full Text:PDF

GTID:1450390011956945

Subject:Biology

Abstract/Summary:

This dissertation addresses two challenging problems arising in inference with censored failure time data. The additive hazards model provides a unified framework for these problems to be discussed, not only because it is a useful alternative to the well-known Cox model and has significant practical implications, but also because its simple yet elegant structure allows one to explore some fundamental aspects of these problems.;In the first part of this dissertation, we consider the estimation problem in additive hazards regression with missing covariates. We are interested in both the case where the observation probabilities are known and the case where they are unknown but can be parametrically modeled and estimated. By modifying the pseudoscore function with full data, we introduce some weighted estimators for the regression coefficients and the cumulative baseline hazard function. The proposed estimators are then shown to be consistent and asymptotically normal under mild conditions, with asymptotic variances that can be easily estimated. Our theoretical results and simulation studies indicate that using estimated weights in the simple weighted estimators may yield important efficiency gain and that the augmented weighted estimators are even more efficient. The proposed methods are further illustrated by a mouse leukemia data example.;In the second part, we turn to the variable selection problem in the additive hazards model. Motivated by linking high-throughput genomic data to survival outcomes, we are particularly interested in the high-dimensional setting where the dimension of covariates may grow fast, possibly nonpolynomially, with the sample size. We propose to perform variable selection and estimation simultaneously by using a class of regularized estimators with a general family of concave penalties, including several popular choices such as the lasso, SCAD, MCP, and SICA. In a nonasymptotic framework where the model dimensions are allowed to vary freely, we rigorously investigate the weak oracle properties and oracle properties of the proposed estimators. Our theoretical results are essentially different from those in the existing literature, and provide new insight into the model selection properties of regularized estimators for survival models. We illustrate the proposed method by simulation studies and application to a diffuse large B-cell lymphoma data set.;A common theme underlying the theoretical development in this dissertation is the use of modern empirical process theory. Indeed, we rely on the language of empirical process theory to establish our theoretical results for both problems considered here, and they serve as excellent examples for demonstration of the power and elegance of this mathematical tool, especially in the context of survival analysis.

Keywords/Search Tags:

Additive hazards, Variable selection, Data, Model, Covariates

Related items

1	Variable Selection Methods In Statistical Models For Survival Data
2	Non-marginal Variable Screening For Additive Hazards Model With Ultrahigh-dimensional Covariates
3	Regression Analysis Of Current Status Data With Auxiliary Covariates
4	Regression Analysis Of Case ? Interval-Censored Failure Time Data Under The Additive Hazards Model With Auxiliary Covariates
5	Adaptive LASSO Variable Selection Method For Current Status Data Under The Additive Hazards Model
6	Regression Analysis Of Dependent Interval-censored Failure Time Data With The Additive Hazards Model
7	Variable Selection And Estimation For Semi-parametric Additive Hazard Model With High-dimensional Covariates
8	Variable Selection In Additive Hazard Model For Right Censored Data
9	Research On Two Kinds Of Joint Modeling Methods With Latent Variable Under Interval-censored Data
10	Variable Selection Of Zero-inflated Model With Missing Covariates