Font Size: a A A

Variable Selection Methods In Statistical Models For Survival Data

Posted on:2015-02-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:J C LiuFull Text:PDF
GTID:1260330425475216Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
Survival data occurs widely in biomedicine, economic and finance, actuarial science of insurance, reliability engineering and other fields. However, due to censoring, it is not suitable to analyze survival data by classical statistical methods of complete data. Therefore, how to make inferences about it is always a burgeoning theme. Moreover, multivariate survival time data arises frequently in many biomedical studies when more than one failure outcome is observed for an individual. A key feature of this type of data is that the survival times may be related to each other for the same subject or cluster. Because of the complex dependence and censorship, inferences about it become nontrivial. However, owing to its wide use in practice, the statistical analysis for multivariate survival time data has attracted more and more attention.With the development of modern technology, mass data has been encountered in many fields, especially biological information, aerospace, artificial intelligence and elec-tronic commerce and so on. Generally, this data behaves very high dimension and noise. How to extract the useful information from such high dimensional data is a fundamental problem. As an efficient tool to mine important information, variable selection has re-ceived great attention by statisticians. However, it is often infeasible to deal with such high dimensional data by classical variable selection methods. Therefore, many improved methods have been proposed. Among them, the most popular methods are the regular-ization methods, such as LASSO, SCAD and MCP etc. In the framework of survival data, including multivariate survival time data, this dissertation addresses the following three questions about the regularization methods:firstly, how to select important variables when covariates have a group structure; secondly, how to carry out variable selection for the settings of the dimension p>> n, where n is the sample size; thirdly, how to identify important variables for a semiparametric regression model.In Chapter2, we discuss the variable selection problem in the additive hazards model where the covariates have been grouped. The aim of this study is to simultaneously identify the important variables between the intra group and inter group. To this end, we consider a hierarchical penalty method. For the case of the diverging dimension, we establish the large sample properties of the proposed method. Numerical results indicate that, when there exits a group structure for the covariates, the hierarchically penalized method outperforms than some existing methods such as the LASSO, SCAD and Adaptive LASSO and so on. Finally, we analyze a gene expression dataset by the proposed method.In Chapter3, we consider the large sample properties for a class of nonconcave penalized procedures in the additive hazards model when the dimension of covariates may grow nonpolynomially with the sample size n, namely, exp(nδ) with δ>0. In the condition similar to Irrepresentable Condition proposed by Zhao and Yu [97], we prove that the proposed estimation behaves strong oracle property. It is interesting to notice that this property holds for the LASSO. In addition, the asymptotic normality has been established, which don’t satisfy for the LASSO penalty.In Chapter4and5, we study the variable selection in the partially linear vary ing-coefficient marginal hazards model and the partially linear marginal hazards model for multivariate survival time data, respectively. For the parametric parts, we mainly use an ideal of the one-step backfitting method. And, the important nonparametric function can be identified through hypothesis testing. Under some regular conditions, we obtain the oracle properties of the corresponding estimations. The simulation results demonstrates that the proposed methods perform well. Finally, we apply these methods to the colon cancer data analysis.
Keywords/Search Tags:Survival Analysis, Censored Data, Multivariate survival time Data, High Dimensional Data, Variable Selection, Regularization Method, Nonconcave Penalty, Group Variable Selection, Two-Level Selection, Additive Hazards Model, Marginal Haz-ard Model
PDF Full Text Request
Related items