| Objective:In this paper,five methods,including block forest model,Cox-nnet model,IPF-LASSO model,Mboost model and Survival SVM model,which can be used to integrate clinical and multi-omics data for prognostic prediction were compared and evaluated through simulation experiment and case analysis,so as to provide suggestions for high-dimensional data modeling in survival analysis.Methods:Firstly,the algorithms of block forest model,Cox-nnet model,IPF-LASSO model,Mboost model and Survival SVM model are expounded.Secondly,six simulation scenarios are designed in the simulation experiment: Simulation experiment A generates simulation data with different sparsity;Simulation experiment B generates simulation data with different intensity of variables;Simulation experiment C generates simulated data with different interaction intensities.Simulation experiment D generates simulated data with different censoring ratio.Simulation experiment E produced simulated data of correlation between different groups.The simulation experiment F produced clinical and omics data with different variable intensities.The prediction performance of the five models under different simulation scenarios was compared by concordance index.Finally,survival information,clinical information and multi-omics data of lung adenocarcinoma patients were obtained from UCSC Xena database.In the real data,the consistency index,time-dependent subject working characteristic curve and decision curve analysis methods based on the nearest neighbor algorithm were used to further evaluate the prediction accuracy and clinical practical application value of the above model.Results:In most simulation scenarios,the prediction performance of block forest model is the best,and its advantages are more obvious compared with other models when the number of real variables is small,the intensity of variables is small and there is strong interaction.However,the prediction accuracy of block forest model is lower than that of IPF-LASSO model and Mboost model when the data censoring ratio is higher.The simulation scenarios with different groups of correlation and different modules with different strength of variables have little influence on each model.In all simulation scenarios,the prediction performance of Cox-nnet model and Survival SVM model is worse than that of the other three models.The results of case analysis show that the prediction accuracy of block forest model is the highest and has certain clinical benefits,followed by Mboost model.In the case of using clinical information and single omics data(m RNA data),the predictive performance of all models decreased,while the prediction accuracy of BF model remained the highest.Conclusions:When integrating clinical and high-dimensional omics data,it is recommended to use block forest algorithm to construct prognostic prediction model.This method usually has high prediction accuracy and certain clinical application value.When the data has a high censoring ratio,that is,the number of events is small,the optimal model can be considered to be compared with Mboost model for modeling. |