Cancer is one of the major threats to human life and health,and its impact on human health is increasing.Accurate prediction of the survival status and survival time of cancer patients is helpful for doctors to formulate effective personalized treatment methods,and it can also help patients better understand their condition.With the development of high-throughput sequencing technology,the study of genomic data in the process of cancer occurrence and development has also developed rapidly.Using gene expression data to predict the survival of cancer patients could not only help physicians develop better treatment options,but also provide clinical interpretation.At present,survival analysis and prediction methods have attracted great attention in the analysis and study of the relationship between genomic and clinical characteristics and survival of cancer patients.However,the high dimensionality and complex nonlinear characteristics of genomic data have brought great challenges to the accuracy and stability of existing survival analysis prediction methods.At the same time,there are a certain proportion of censored data in cancer survival data,and common prognosis prediction methods are difficult to accurately assess the risk of disease.In order to overcome the limitations of traditional survival analysis methods,this thesis uses Extreme Gradient Boosting(XGBoost)algorithm and Elastic-Net Cox algorithm to construct a new survival analysis prediction framework.Elastic-Net is used to improve the adaptability of Cox model to high-dimensional genomic data,and XGBoost algorithm is combined to enhance the nonlinear learning ability of the model.It has been verified by experiments on TCGA(15 sets of cancer data sets)database.Firstly,the thesis proposes XGBENC(XGBoost with Elastic Net-Cox)algorithm.Based on the extreme gradient boosting framework,elastic network Cox proportional hazards regression is added to improve the adaptability of the algorithm to high-dimensional nonlinear data.By calculating the first and second derivatives of the loss function,the loss function in the extreme gradient boosting framework is redefined and the structure of constructing the survival tree is improved.Then,a cancer survival analysis prediction model based on XGBENC(XGBoost with Elastic Net-Cox)was constructed.In this thesis,the performance of XGBENC algorithm was tested on fifteen different cancer type datasets provided by TCGA database.We used internal five-fold cross validation and grid search methods to optimize the model parameters,and external five-fold cross validation to assess the model’s generalization ability.In this thesis,the XGBENC model was compared with six existing survival analysis models,using C-index and time-dependent AUC scores as the main evaluation indicators on 15 datasets.The experimental results showed that the XGBENC model was superior to other models on 14 datasets,with an average C-index increase of 15.73% and an average time-related AUC increase of 14.9%.The analysis shows that XGBENC algorithm has higher prediction accuracy when dealing with high-dimensional non-linear gene data,and it has been widely verified on real datasets,which proves that XGBENC algorithm is a reliable and accurate algorithm. |