Font Size: a A A

Research On Sepsis Prognosis With Ensemble Learning And Model Explanation Method

Posted on:2021-08-13Degree:MasterType:Thesis
Country:ChinaCandidate:X C LiuFull Text:PDF
GTID:2504306107950489Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Due to the complicated pathogenesis of sepsis,which is accompanied by various complications and the rapid disease progress,it has brought great difficulties to clinical diagnosis and prognosis and become the third leading cause of death in the world.Finding biomarkers to early identify sepsis and timely treatment is the key to reduce mortality.Existing methods that use machine learning to find markers for sespsis from big data often faces issues such as missing data and insufficient interpretability of models.Incorporating comprehensive potential risk factors and using interpretable machine learning to discover early biomarkers for sepsis are of important theoretical and clinical significance.Based on the relevant clinical data collecting and collating from Wuhan Tongji Hospital,the risk prediction and factor analysis of sepsis were carried out using ensemble learning and model interpretation methods.Firstly,after analysing the high dimension,sparse,heterogeneous and irregular sampling characteristics of medical data,the available feature selection,ensemble learning and model interpretation method are introduced.Secondly,according to the characteristics of electronic health records,a framework of data extraction for multi-source heterogeneous medical events was designed and implemented,to transform the chaotic raw medical records into a structured vectors representation of patients.Then two sepsis research datasets were extracted and cleaned,including general population and Coronavirus Disease 2019(COVID-19)sepsis dataset.Finally,due to the high dimensional sparse space,height corelation and redundancy among features of sepsis datasets,a new feature selection algorithm based on variable-ranking combining with extreme gradient boosting tree was proposed for sepsis risk prediction.Features are sorted based on importance score of tree-based model and eliminated through iteration.Predictive model was trained on the optimal feature subset.Then post-interpretation method was adopted to explain the best model and analysis the influence of related risk factors on sepsis.44926 patients who had undergone blood culture in the laboratory of Tongji Medical College from 2012 to 2019,and 4207 patients with COVID-19 from February to March 2020 were extracted.Mapping from the above patients,there were 16051 samples in normal sepsis dataset,including 6329 positive cases and 9722 negative cases.There were 2453 samples in COVID-19 sepsis dataset,including 1376 COVID-19 sepsis positive cases and 1077 sepsis negative cases.The prediction model achieved 91.17% AUC and 83.07% accuracy on the normal sepsis dataset.93.49% AUC and 91.03% accuracy were achieved on COVID-19 sepsis dataset.Model interpretation method found high-risk evidence of children with sepsis,and analyzed the effect of coagulation function and inflammatory factors for COVID-19 sepsis.Using 45 indicators such as inflammatory factors can predict abnormal blood coagulation in patients with COVID-19 average 3.68 days in advance,with 88.57% accuracy and 94.31% AUC respectively.
Keywords/Search Tags:Sepsis, Machine Learning, COVID-19, Electronic Health Records, Model Interpretabilty
PDF Full Text Request
Related items