Font Size: a A A

Regularized Estimation For The Accelerated Failure Time Model With Elastic Net Penalty

Posted on:2017-02-23Degree:MasterType:Thesis
Country:ChinaCandidate:K L WangFull Text:PDF
GTID:2297330485476200Subject:Statistics
Abstract/Summary:PDF Full Text Request
An important goal of high-dimensional data research is to identify genetic markers that are related to the occurrence and development of disease. The very typical example is prognosis of microarray data analysis. It is very difficult to search for the significantly associated biomarkers in microarray gene expression data. Due to the high dimensional property, standard survival analysis techniques cannot be applied directly. Moreover, in thousands of genes studied, normally only a small portion of genes are associated with the disease. When the study object is time, it is hard to get accurate data because of the presence of censored time, and thus selecting related genes becomes very challenging.We proposed the elastic net regularized Gehan’s estimator for the accelerated failure time model and selected gene data which had great importance on the survival time. The proof of the property of estimator was given and a similar algorithm like LASSO algorithm was shown to select variables and estimate the parameters. Differed from the method based on the existed inverse probability weighting and Buckley and James estimation, the proposed method does not need extra assumptions on censored data, which makes it more general applicable.In this thesis we did a lot of simulations, some of them were set as in paper of Cai, T. published in 2009, and the performances of the proposed method on limit sample were tested. By comparing with Cai, T., our method has better ability to select variables and could handle data with larger variable number and small observation number, this is where Cai, T. cannot solve. Meanwhile our method also has some shortcomings, such as when correlation among variables is strong, we have larger mean squared error than Cai, T.Finally we applied the method on lung adenocarcinoma dataset in paper of Beer, D. and selected genes correlated with lung adenocarcinoma. We selected genes that were not found by Beer, D. and showed the significance of these genes by t test. However the truly relation shape between the genes and disease still needs proof of follow-up clinical studies.
Keywords/Search Tags:accelerated failure time model, elastic net, Gehan estimate
PDF Full Text Request
Related items