| Background and objective Kawasaki disease patients with intravenous immunoglobulin resistance are at a greater risk of developing coronary artery abnormalities.Several scoring models have been established to predict resistance to intravenous immunoglobulin,but clinicians usually do not apply those models in patients because of their poor performance.This study was aimed to use machine learning methods to build a model to predict intravenous immunoglobulin resistance in Kawasaki disease patients,access the performance and the clinical utility of the model through internal validation,and evaluate importance of features in the model.Materials and Methods We retrospectively collected data from single center.the data including 753 observations and 82 variables.Variables can be divided into four categories: basic information,clinical features,sonography measurement and laboratory results.A total of 644 observations were included in the analysis,and 124 of the patients observed were intravenous immunoglobulin resistant(19.3%).Data from patients who were discharged before Sep 2018 were included in the training set(n=498,77.2%),while all the data collected after 9/1/2018 were included in the test set(n=147,22.8%).We considered 7 different linear and nonlinear machine learning algorithms,including logistic regression(L1 and L1 regularized),decision tree,random forest,Ada Boost,gradient boosting machine(GBM),and light GBM,to predict the class of intravenous immunoglobulin resistance.20% of the training set was used in the hyperparameter tuning process to achieve a better learning result.We used the area under the ROC curve,accuracy,sensitivity,and specificity to evaluate the performances of each model.Additionally,the feature importance was evaluated with SHapley Additive ex Planation(SHAP)values,and the clinical utility was assessed with decision curve analysis.We also compared our model with several published scoring models.Results Compared with other machine learning models,the GBM model had the best performance(area under the ROC curve 0.7423,accuracy 0.8844,sensitivity0.3043,specificity 0.9919).Through SHAP values,we found that platelet count,blood calcium,albumin-to-globulin ratio were top three features of the model.And the three highest SHAP value features that pushed the prediction higher are platelet count,total bilirubin and cholesterol.Through the decision curve analysis,we found that the net benefit for the GBM model was greatest across the range of threshold probabilities higher than 13% compared with the net benefit for the other machine learning models.We also compared the Kobayashi score,Egami score,Formosa score and Kawamura score with the GBM model in our test set.The GBM model outperformed all of the aforementioned four scoring models in area under the ROC curve,accuracy and specificity.Conclusion Our study demonstrates that the machine learning model could predict intravenous immunoglobulin resistance in Kawasaki disease patients robustly.The machine learning model could be implemented as clinical decision support in treatment of Kawasaki disease patients. |