Font Size: a A A

A Research On Strategy For Mining Clinical Modifiable Factors And Handling Its Missing Data In Electronic Health Record

Posted on:2023-05-08Degree:MasterType:Thesis
Country:ChinaCandidate:B R YuanFull Text:PDF
GTID:2544307046992619Subject:computer science and Technology
Abstract/Summary:PDF Full Text Request
Acute Kidney Injury(AKI)is a common and dangerous clinical syndrome in hospitalized patients,and it is one of the most serious health problems in the world.Combining Electronic Health Record(EHR)data and machine learning technology to build an AKI risk prediction model can help overcome the harm caused by delayed diagnosis and high missed diagnosis rate.However,AKI risk prediction models still face many challenges: First,despite the reliable predictive performance of prediction models,intervention and prevention methods for AKI are very scarce.The key to prevention lies in identifying clinically modifiable factors and understanding their effects in different time windows;in addition,AKI risk prediction models often encounter data missing problems in practical applications,resulting in a significant drop in prediction performance.Aiming at the identification of modifiable risk factors of AKI,this paper improves the commonly used XGBoost algorithm according to the concept of multi-view learning,and proposes a Multi-view XGBoost algorithm.To avoid underestimating the role of modifiable features due to the correlation between modifiable and non-modifiable features,we adjusted the model’s attention on views according to the modifiability of different views in hospital.Meanwhile,we designed a view weight adjustment mechanism to avoid performance degradation due to excessive focus or neglect of certain types of features,and ensured that the interactions between different features can still be learned effectively.Then,based on the commonly used interpretation method SHAP,we proposed two metrics,Inter-class Difference and Exposed Score Difference,to estimate the cumulative contribution of each feature on prediction for the whole dataset.Finally,a temporal analysis was introduced to analyze the effect of risk factors on the patient risk change over different time windows.Besides,we estimated the potential benefit of intervening modifiable features at a specific time.Experiments showed that electrolyte balance-related indicators explained 38.3% of the patient risk change from before 72 hours to before 24 hours,followed by high-risk drugs(13.7%),nursing strategies(12.1%),blood pressure(10%),infections(7.8%)and anemia(5.4%).The effects of heart surgery and related conditions,ventilator use,and anemia can last for more than 72 hours.Further experiments show that tuning predictor importance with our method can significantly reduce the impact of missing data on model performance,and this mechanism works independently of the common imputation strategies.The combination of the two strategies can significantly mitigate influence of missing data.Cross-validation is used to select the best imputation method for each feature,and Multi-view XGBoost is used to reduce the importance of highly missing features in the prediction model.In the simulated case of complete absence of two modifiable views data(laboratory data and physical sign data),experimental results demonstrate that after incorporating the cross-validation method of taking the best imputation and Multi-view XGBoost,the AUROC of the model(0.774,95%-CI: 0.770-0.777)significantly outperforms the baseline model(0.717,95%-CI: 0.713-0.722),very close to the model that did not consider these two views(0.782,95%-CI: 0.780-0.784).
Keywords/Search Tags:Acute Kidney Injury, Electronic Health Records, Machine Learning, Mult-iview Learning, Modifiable Feature Identification, Factor Temporal Analysis, Data Missing
PDF Full Text Request
Related items