Font Size: a A A

Bayesian Variable Selection in Parametric and Semiparametric High Dimensional Survival Analysis

Posted on:2012-03-07Degree:Ph.DType:Dissertation
University:University of Missouri - ColumbiaCandidate:Lee, Kyu HaFull Text:PDF
GTID:1450390008999940Subject:Statistics
Abstract/Summary:
Variable selection for high dimensional data has recently received a great deal of attention. However, due to the complex structure of the problem, only limited developments have been made for time-to-event data where censoring is present. In this dissertation, we propose several Bayesian variable selection schemes for Bayesian parametric and semiparametric survival models for right-censored survival data.;In the first chapter we introduce a special shrinkage prior on the coefficients corresponding to the predictor variables. The shrinkage prior is obtained through a scale mixture representation of Normal and Gamma distributions. Our proposed variable selection prior corresponds to the well known frequentist lasso penalty. The likelihood function is constructed based on the Cox proportional hazards model framework, where the cumulative baseline hazard function is modeled a priori by a gamma process. We assign a prior on the tuning parameter of the shrinkage prior and adaptively control the sparsity of our model. The primary use of the proposed model is to identify the important covariates relating to the survival curves.;In the second chapter we extend the idea of the shrinkage prior such that it can incorporate the existing grouping structure among the covariates. Grouping is natural in microarray studies where often the genes belonging to the same biological pathways are grouped together and perform as a single unit. Our selected priors are similar to the elastic-net, group lasso, and fused lasso penalty. The model introduced in this chapter is highly useful when we want to take into consideration the grouping structure. The main use of the proposed model here is to identify important covariates and also shrink the coefficients of similar (or grouped) variables toward a same value and thus discover any grouping behavior among the covariates.;In the third chapter we propose a Bayesian variable selection method for high dimensional survival analysis in the context of parametric accelerated failure time (AFT) model. To identify subsets of relevant covariates the regression coefficients are assumed to follow the conditional Laplace distribution as in the first chapter. We used a data augmentation approach to impute the survival times of censored subjects. In our pro- posed model, all required full conditional densities are of known forms. The conjugacy enables us to obtain posterior estimates of parameters via Gibbs sampling without employing any complex Monte Carlo methods. Therefore model fitting is significantly fast in this approach for high dimensional parameter space.;To implement our methodologies in first two chapters, we have developed special Markov chain Monte Carlo algorithms with an adaptive jumping rule. The Monte Carlo algorithm proposed for the third chapter is straightforward and significantly faster than the first two methods. We have successfully applied our models to several simulated data sets and real microarray data sets which contain right-censored survival time. The performance of our Bayesian variable selection models compared with other competing methods like CUS, SPC, iterative BMA, and BVSME-Surv is also provided to demonstrate our superiority claim.
Keywords/Search Tags:Variable selection, High dimensional, Survival, Data, Shrinkage prior, Parametric, Model
Related items