| The traditional mortality prediction method for patients related to the liver dysfunction is to use the Apache scoring system for mortality prediction.The extraction of relevant sample sets is also based on the scoring system of the scoring system.This method is suitable for a wide range of patients,and the number of samples needed is also low.And it is widely used in intensive care units at home and abroad.This paper is different from the study of sample set construction around the Apache scoring system.It directly extracts patient data from the MIMIC-III and Philips eICU collaboration databases.Through the process of sample feature screening,missing value processing,and data standardization,the resulting data dimension is more accurate.With higher and fewer missing values,the prediction rate of mortality in patients with liver dysfunction is also higher.Due to the complexity of the characteristics of the data samples,the features of the samples were initially screened in combination with the manual selection and variance comparison.For the missing value part of the sample set,we analyzed the significance of sample characteristics Gini,chose to use the median to fill missing values on the MIMICIII sample set,and use the majority to fill missing values on the eICU sample set.After normalizing the data,principal component analysis(PCA)was used to reduce the dimensionality of the sample set’s sample features,and the effect of the predictive model trained on the sample sets before and after dimension reduction was compared.The results demonstrated that the sample set without dimensionality reduction was used.More advantages.Next,we used machine learning methods such as random forest,support vector classification,and multi-layer perceptron neural network to achieve liver dysfunction prediction,mortality prediction,and liver dysfunction on the MIMIC-III and e ICU sample sets,respectively.At the same time,in order to confirm the stability and generalization ability of the prediction model,model evaluation indexes such as recall rate and f1 score were introduced,and various algorithm models under different sample sets were analyzed one by one.In order to verify the validity of the research method in this paper,we compared the effect of using the Apache-IV scoring system for mortality prediction and found that the machine learning algorithm used in this paper has a higher prediction accuracy.In the process of analyzing the predictive model,we found that some of the sample features showed a higher importance in the prediction of patient mortality by comparing the Gini importance of sample features.At the same time,compared with the original sample set,the anion gap shows a higher Gini importance when predicting the mortality of the liver dysfunction sample set,and the liver dysfunction sample set after removing the anion gap.The rate of death predictions has slipped,suggesting a potential link between anion gaps and the liver dysfunction. |