| Sepsis is a life-threatening condition that occurs when the body’s response to infection causes tissue damage,organ failure,or death.Its morbidity and mortality rates remain high worldwide.Reliable early prediction is beneficial for improving the sepsis outcomes.Studies have shown that the delay in receipt of antibiotics for sepsis will increase the patient’s organ damage level and mortality,as well as cause irreversible harm to the patient’s health.At present,early prediction of sepsis attack time is still a major challenge for intensive care unit patients.Meanwhile,the existing models can hardly meet the actual needs of clinical decision support because most AI models in this task are lack of interpretability.Based on above,this dissertation aims to analyze the ICU patients’ electronic medical record data shared by the 2019 PhysioNet/Cardiology Computational Challenge,and design and develop a set of prediction models based on improved machine learning method to automatically identify patients with sepsis,with a view to achieving early prediction of sepsis 6 to 12 hours before clinical onset.Moreover,the interpretability of the prediction models are analyzed to find out the main significant factors for predicting the onset of sepsis,so as to improve the explainablity and reliability of the models.Firstly,this dissertation focused on the exploration and mining of data sets,analyzed the characteristics of data distribution,and solved the problems of data imbalance,high missing rate and low data quality through preprocessing,then constructed a high-quality sample set in the end.Then,according to the characteristics of the data set,a feature expansion scheme was designed,and three groups of new features were extracted based on 40 original features in oder to improving the availability and diversity of data:informative missingness features(102),time series statistics in 6-hour sliding windows(30),and empirical scoring features from clinical scales(8).The subsequent results was showing that the feature engineering is effective in improving the accuracy and other evaluation indexes of prediction.Next,three sets of prediction algorithms based on Random Forest,XGBoost and ensemble learning were developed for the sepsis early prediction,and the models were trained,optimized,verified and evaluated respectively.The utility score of the optimal model(based on heterogeneous ensemble)is 0.4601 and the AUROC is 0.8615.The utility score of the best XGBoost model is 0.4588 and AUROC is 0.8608.The utility score of the best RF model is 0.4274 and the AUROC is 0.8465.The performance of proposed models is comparable to or slightly better than that of other studies training on the same data set.At the same time,the corresponding Compact models were trained by using concise feature subsets and these models will be used in the subsequent analysis.Finally,two interpretability analysis methods were introduced to study the feature importance and the influence mode of the best Compact models.Morris sensitivity analysis method was applied to the proposed ML models innovatively,calculating the degree and direction on the model output of feature influence.Besides,another interpretability analysis model,SHAP,based on marginal contribution from game theory was utilized and it served as a comparison and supplement for Morris method.The results of interpretability analysis indicated that Temp(temperature),ICULOS(ICU admission time),HR series(heart rate related features),FiO2 series(inhaled oxygen fraction related features),Resp(respiratory frequency related features)and MAP series(average arterial pressure related features))are main risk indicators for sepsis early prediction.Further,the possibility of being diagnosed as sepsis will be improved when the value of Temp and ICULOS is greater or the value of HospAdmTime is smaller.These results are consistent with clinical experience,which provides solid support for the interpretability and credibility of the proposed models.To conclude,this dissertation established a complete data processing pipeline,then analyzed,traind and optimized machine learning models for early prediction of sepsis,and carried out a series of in-depth research on the interpretability analysis of the proposed models.In the end,this research achieved the early prediction task with very high performance,and improved the transparency and credibility of the proposed models through qualitative and quantitative interpretability analysis,which provides a theoretical guarantee for the deployment of the models in clinical application scenarios. |