| BackgroundThe Cox-proportional hazards regression model is a default choice in analyzing right-censored time-to-event data.However,its restrictive assumptions are always not met in applications,while the forests approach seems to be an attractive alternative approach to Cox model due to its high flexiblity.Survival forests methods have been developed in recent years,and three representative methods are random survival forests(RSF),conditional inference forests(CIF),and survival forests with maximally selected rank statistics method(MSR-RF).Despite the development in the methods,only a few researches have been done to compare the performance of the survival forests in recent years without studying MSR-RF.What’s more,only predictive performance have been compared without variable selection performance.ObjectiveIn this paper we use simulation study and real data study to compare prediction performances and variable selection performances of three survival forests mentioned above,including RSF,CIF and MSR-RF.MethodIn this paper we use simulation study and real data study to compare prediction performances and variable selection performances of three survival forests mentioned above,including RSF,CIF and MSR-RF.To evaluate the performance of variable selection,we combine all simulations to calculate the rates of ranking top of the variable importance of the correct variables,which higher rates means better identification ability.We use Integrated Brier score(IBS)to measure the prediction accuracy of all the three methods.The smaller IBS value,the greater the prediction.Results1.Simulations show that three forests differ slightly in prediction performance,while RSF and MSR-RF seem to perform a little better in most cases.Real data results show that methods all have advantages under different conditions.2.For variable selection performance1)When there are multiple categorical variables in the simulation datasets,the selection rate of RSF seems to be lowest in most cases.MSR-RF and CIF have higher selection rates,and CIF performs best with the interaction term.2)When there are only continuous variables in the simulation datasets,MSR-RF perform better,When there are only binary variables in the data,RSF and MSRRF have more advantages than CIF.3)The fact that correlation degree of the variables has little effect on the selection rate indicates that three forest methods can deal with data with correlation.4)When the variable dimension increases,MSR-RF and RSF seem to be more robust than CIF.5)MSR-RF is least affected by sample size and has the most stable performance;RSF show disadvantages when the sample size is small.ConclusionsAll three methods show advantages in prediction performances and variable selection performances under different situations.It is important to identify the appropriate method in real use according to the research aim and the nature of covariates.Compared to the more popular method RSF,MSR-RF outperform when data including covariates with many splitting points and small sample size;CIF outperform when data including covariates with many splitting points and intetraction,so MSR-RF and CIF possess practical value. |