| BackgroundThe longitudinal data and survival data are the common data types in medical research.For the longitudinal data,the most widely used method is liner-mixed effect model(LME).The Cox model or the accelerated failure time(AFT)model are often used for survival data.In practice,for survival analysis including time-dependent covariates,two models are often used separately,which resulting in the greater bias.In addition,although the extended cox model is incorporated into the covariance of the time,the assumption is that values between the two measurements is constant.Thus,there is no correction of the measurement error,and the extended model also causes nonnegligible bias.In recent years,the joint model has been proposed and could analysis the longitudinal data and survival data simultaneously.However,the joint model also has many types,mainly including the LME-Cox joint model,LME-AFT joint model and the Bayesian joint model.In medical research,how to choose the appropriate joint model according to the characteristics of data,such as different survival distribution,different sample size,and whether to meet the proportional hazard hypothesis,is a question worth discussing.In medical practice,how to choose the appropriate joint model according to the different characteristics of the data is a question worth discussing.ObjectiveThis study sets multiple data scenarios by data simulation.In different data scenarios,a variety of models are building,including LME-Cox Joint Model,LME-AFT Joint Model,Bayesian Joint Model,Time-Varying Cox Model to evaluate the performance under different data scenarios,and compare the accuracy of the above models for parameter estimation.In addition,the Joint Model was applied to real medical data to study the impact of time-dependent variable on patient survival outcomes,and a dynamic prediction model to predict personalized survival probability and provide reference for clinical decisions.MethodData simulation: This study conducted data simulation based on the methodology derivation and software development of Bender,Crowther et al.,in terms of survival data simulation research under complex conditions.The simulated data set time-fixed covariates(dichotomous variables,continuous covariates),time-dependent covariates(longitudinal data),survival time and survival outcome.Data simulation uses ‘survsim v4.0.9’ package in stata16.For time-fixed covariates,the simulation generates two variables: age and trt.‘trt’ is a binary variable and follows Bernoulli distribution.‘age’ is a continuous variable and follows a normal distribution with a mean of 65 and a standard deviation of 12.For the time dependent covariable,the longitudinal repeated measurements were simulated according to the linear mixed effects model.For the simulation of survival data,the survival time and survival outcome are simulated according to the closed expression derived by Austin.In this study,144 data scenarios were simulated with different strength of association α(α=-1.5,α=-0.5,α=0.5,α=1.5),different survival distributions(Exponential distribution,Weibull distribution),whether proportional hazard hypothesis was satisfied,different sample sizes(N=200,N=500,N=1000),and different follow-up times(T=3,T=5,T=10).Model construction and evaluation: According to different data scenarios,the LME-Cox Joint Model,LME-AFT Joint Model,Bayesian Joint Model and time-dependent Cox Model are constructed respectively to estimate the effect of time-dependent covariable.The model was evaluated in terms of accuracy of parameter estimation(relative bias,mean square error)and precision(standard error,95% confidence interval acquisition rate).Case study: The LME-Cox Joint Model was applied to a multi-time point APACHE-II score on the risk of death in patients with severe stroke.To evaluate the effect of time-dependent changes of APACHE-II score on the risk of death in patients with severe stroke,establish a dynamic prediction model,and develop a visual interface.ResultsResults of simulation study:Under the Exponential distribution,when the time dependent covariable was a the risk factor(α>0),the relative bias and mean square error of the LME-Cox Joint Model were minimum,and the accuracy of the LME-Cox Joint Model was highest.When α<0,the relative bias and mean square error of the LME-AFT Joint Model are minimum under the condition that the baseline covariate satisfies the proportional hazard assumption.When there are covariables that do not satisfy the proportional hazard assumption,the relative bias and mean square error of the Bayesian Joint Model and the LME-Cox Joint Model are smaller.In terms of the accuracy of the model,the standard error of Bayesian Joint Model is the smallest.When the time-dependent covariable was a risk factor,the 95% confidence interval coverage rate of the LME-Cox Joint Model was the highest on the whole.When the dependent covariable was the protective factor,the 95% confidence interval acquisition rate of the Bayesian Joint Model was higher on the whole.In the situation of Weibull distribution,when the time-dependent covariable is a risk factor,the LME-Cox Joint Model has the relatively smaller relative bias and standard error.When the time-dependent covariable was the protection factor of the weak correlation coefficient( =-0.5),it was the relatively smaller relative bias and standard error.The Bayesian Joint Model is the lowest in terms of the accuracy of the model.The Bayesian Joint Model also has a higher95% confidence interval coverage and is more stable.In general,the performance of the above statistical models under different scenarios has the following characteristics.1.With the increase of sample size,the accuracy and precision of each model are increased.2.With the increase of follow-up times,the accuracy and precision of each model increased.3.When there are baseline covariates that do not meet the proportional hazard assumption,the accuracy of the model parameter estimation will decrease.Results of case studies:Multiple-time point APACHE-II scores had a significant effect on severe stroke patients by establishing Joint model(P<0.001).In the longitudinal submodel,measurement time,age,gender,SOFA score baseline value,and smoking history(unknown vs non-smoking)had an effect on APACHE-II score value.After adjusting for the above factors,the results of the survival submodel showed that for every 1 point increase in the time-dependent APACHE-II score,the risk of death in patients with severe stroke increased by 39%(95%CI: 0.24,0.56).The dynamic prediction model was established,and the external verification results showed that the overall AUC values for 14-day mortality risk prediction of severe stroke were0.76,0.76,0.78,0.76,respectively at the 6th,8th,10 th and 12 th days of follow-up,which had good predictive performance.ConclusionThe performance of parameter estimation of joint models is different under various data scenarios.And the appropriate Joint Model should be selected according to different data scenarios.In the medical field,The joint model is an effective method to dynamically predict the effects of time-dependent covariates on survival outcomes. |