BACKGROUND:Propensity score(PS)methods are commonly used to control confounders in real world study at present,among them PS weighting method has also been used more and more commonly.The steps of PS weighting are as follows: 1.estimate PS values,2.weight samples according to PS values,3.balance test,4.estimate the effect.The commonly used method to estimate PS value is logistic regression model at present.However,it is difficult to set the relationship between them correctly when the relationship between confounders and treatment factors is complex and unknown.Therefore,the PS values will be estimated incorrectly,leading to the inaccurate effect estimation.Generalized boosted models(GBM)are a kind of adaptive regression models,without needing to set the relationships between independent variables and dependent variable artificially.By applying GBM to the estimation of PS values,the models can find the relationships between confounders and treatment factors through iterations,so as to obtain a more accurate estimation of PS values.Inverse probability weighting(IPW)is the commonly used weighting method at present,but the extreme weights from IPW influence the estimation of the effect,especially if PS values has little overlaps between the treatment group and the control group.However,overlap weighting(OW)can avoid the extreme weights problem caused by little PS values overlap.PURPOSE:To construct a GBM-OW model by combining GBM with OW,to solve the problem of inaccurate PS values estimation and extreme weights in the case of binary treatment,so as to obtain more accurate effect estimation,and provide a new statistical analysis method for controlling confounders in real world study.METHODS: Combine GBM with OW according to the balance of confounders,to construct the GBM-OW model.The model was evaluated by simulating data of various scenarios.The data simulation study designed a total of 1008 scenarios,which include combinations of 7 kinds of relationship between confounders and treatment factor,4different overlapping degrees of PS values,2 treatment group ratios,6 classes of sample sizes and 3 outcome types,each scenario was simulated 1000 times.Kolmogorov-Smirnov(KS)statistic and absolute standardized mean difference(ASMD)were used to evaluate the performance of GBM-OW model in balancing the confounders.Relative bias(RB),root mean squared error(RMSE)and 95% confidence interval coverage rate(95% CICR)were used to evaluate the model’s performance in effect estimations.The results were compared with multivariate adjustment regression model(adjusted),logistic-IPW model,logistic-OW model and GBM-IPW model.Then the GBM-OW model was applied to the Medical Information Mart for Intensive Care(MIMIC-IV)database to analyze whether statin use is associated with clinical outcomes in patients with acute kidney injury(AKI)in the intensive care unit(ICU).A further simulation research was carried out based on the case data mentioned above.RESULTS: In the simulation data,the average KS of unweighted,logistic-IPW,logistic-OW,GBM-IPW and GBM-OW models were 0.3687,0.1584,0.0728,0.1402 and0.0160,respectively.The mean ASMD were 0.9312,0.2668,<0.0001,0.3648 and 0.0315,respectively.When the outcome was continuous,the average RB of adjusted,logistic-IPW,logistic-OW,GBM-IPW and GBM-OW models were 24.43%,26.21%,13.40%,107.09%and 6.68%,respectively.The mean RMSE were 0.2694,0.6100,0.1688,1.0947 and 0.1167,respectively.The average 95% CICR were 38.57%,83.42%,80.75%,32.52% and 97.56%,respectively.When the outcome was binary,the average RB of adjusted,logistic-IPW,logistic-OW,GBM-IPW and GBM-OW models were 56.51%,extreme value,27.65%,88.58%and 24.02%,respectively.The mean RMSE were 0.3811,extreme value,0.2396,0.5411 and0.2374,respectively.The mean 95% CICR were 55.02%,85.69%,77.04%,53.07% and85.11%,respectively.When the outcome was survival,the average RB of adjusted,logisticIPW,logistic-OW,GBM-IPW and GBM-OW models were 87.60%,24.52%,27.10%,89.21%and 24.99%,respectively.The mean RMSE were 0.4021,0.2895,0.1513,0.3868 and 0.1473,respectively.The mean 95% CICR were 26.64%,37.42%,77.43%,27.59% and 94.45%,respectively.In the case study,the analysis results of adjusted,logistic-IPW,logistic-OW,GBM-IPW and GBM-OW models for days of ICU stay were as follows(mean difference and 95% CI,statin users vs.non-users):-0.51(-0.79 ~-0.24),-0.41(-0.84 ~ 0.03),-0.49(-0.77 ~-0.20),-0.30(-0.71 ~ 0.12)and-0.38(-0.67 ~-0.09).The results of the five models for 30-day in ICU death risk(HR and 95% CI,statin users vs.non-users)were: 0.68(0.59 ~0.79),0.72(0.56 ~ 0.93),0.74(0.64 ~ 0.87),0.73(0.58 ~ 0.92)and 0.82(0.70 ~ 0.96).The simulation results based on case study data were similar to simulation study described above.CONCLUSION: Through the application of the simulation data and the MIMIC-IV data,the GBM-OW model run smoothly and output the results successfully.In the simulation study,the overall performance of GBM-OW model was better than that of other models,especially when the data scenario was complex and other models did not perform well,it can still have a good performance.In the case study,GBM-OW model was able to balance confounders well and estimated the effect successfully,the conclusion form GBM-OW model was basically consistent with other commonly used models,that is AKI patients in ICU who treated with statins have better clinical outcomes than those who did not treated with statins.In the simulation research based on the case study data,GBM-OW model has better overall performance than other models in RB,RMSE and 95% CICR.In summary,GBM-OW model has good performance in confounders balancing and effect estimation,it can solve the problems of inaccurate PS value estimation and extreme weights in PS weighting analysis.Especially when the data scenario is complex and sample size is enough,GBM-OW model has more advantages compared to other models.Through applications in case study data and simulation research based on case study data,GBM-OW model can be used as an analysis method to control confounders in real world study. |