Font Size: a A A

Research And Application Of Optimizing Survival Analysis Method By Gradient Boosting Tree

Posted on:2021-01-17Degree:MasterType:Thesis
Country:ChinaCandidate:P LiuFull Text:PDF
GTID:2370330623467754Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Survival analysis(a.k.a.time-to-event analysis)has a wide range of applications in healthcare,finance,and other fields.Particularly in clinical disease research,survival analysis plays an important role.It aims to study the probability that an individual experi-ences interested event during observation,find the relationship between predictors and the outcome of interested event,and explore important factors and patterns of interested event occurrence.Survival analysis methods learn the relationship between observed variables and the distribution of time-to-event from available data.However,the previous survival analysis methods have some shortcomings.First of all,in the terms of model assumption,some statistical linear models and ensemble tree models assume the distribution function of time-to-event as a specific expression with parameters.When the distribution of data is unknown or a priori knowledge is lacking,these assumptions will largely limit the prediction accuracy of the model.Secondly,from the view of model interpretability,although some deep learning models are of powerful representation ability,they cannot convincingly explain the impact of observed variables,which restricts their applications in the real world.Finally,when the distribution of data is known or a priori knowledge is sufficient,many algorithms derived from the Cox propor-tional hazard model take the partial likelihood estimation function as the objective func-tion.However,if there are many events that existed in the survival data,the model will suffer bias when estimating parameters and lead to a decreased accuracy since the partial likelihood estimation function is not precise enough.At the same time,some ensemble tree models of the Cox genre are prone to overfitting due to the lack of regularization terms.In order to solve these problems,based on the gradient boosting decision tree,this thesis studied and proposed two novel methods for survival analysis and applied them to the study of breast cancer prognosis.Specifically,the main distributions of this thesis are summarized as follows:(1)The multi-output gradient boosting decision tree method,named HitBoost,is pro-posed.HitBoost can directly predict the density function of the first hitting time(time-to-event).The maximum likelihood estimation function is taken as the objective function.Meanwhile,the concordance index approximated by a convex function is introduced as a part of the objective function also.The HitBoost method is no longer based on any prior assumptions while still retaining model interpretability.(2)The method,named BecCox,is developed.The BecCox method,which mainly optimizes algorithms of the Cox genre,utilizes a single gradient boosting decision tree to predict the proportional hazards of interested event occurrence.In objective function of BecCox,this thesis exploits a more precise partial likelihood estimation function,and a concordance index approximated by a convex function,to reduce the prediction bias brought by inappropriate objective function.(3)By applying the proposed survival analysis method,this thesis used the clinical data of breast cancer patients,collected from the Breast Cancer Research Center of West China Hospital of Sichuan University,build a recurrence prognosis model for the early stage breast cancer patients.At the same time,this thesis demonstrated how to use the breast cancer recurrence prognosis model to do further analysis,such as exploring impor-tant factors that affect breast cancer recurrence and making treatment recommendations.To implement the proposed methods,this thesis first deduced the gradients of custom objective functions with respect to predicted values,and then realized the model training according to the type of model output under the XGBoost framework.The experimental results on four benchmarks(i.e.WHAS,SUPPORT,METABRIC and ROTT2)show that the HitBoost is with the concordance index of 0.929190,0.631281,0.668679,0.705427.It improves the performance of the current best method in the same category with the highest value of about 2.8%,and also surpassed methods that follow the prior assumptions and popular Random Survival Forest.And for the BecCox,it is with the concordance index of 0.898320,0.631837,0.645986,0.702102.The BecCox increases the metrics of the current best method in the same category with the highest value of about 1.7%,and also surpassed the classic Cox proportional hazards model and other algorithms of Cox genre.Therefore,the gradient boosting decision tree based optimization methods proposed in this thesis can be utilized as an effective survival analysis method for the research of clinical diseases or other specific interested events.
Keywords/Search Tags:survival analysis, machine learning, gradient boosting decision tree, propor-tional hazard model, risk prediction
PDF Full Text Request
Related items