| Objective: Sepsis is an organ dysfunction that threatens human life,and its cause is the dysregulated response of host to infection.It is one of the prime reasons of death for critical patients.Because short of specificity in its early clinical symptoms and its aggressive progress,it is difficult to control once it occurs and the mortality is high.The multifactorial characteristics of the disease make its early diagnosis becoming a powerful challenge for clinicians.In this study,selected the related factors in the patients with suspected infection before progressing to sepsis,and based on random forest(RF)to build an early risk warning model of sepsis for the patients with suspected infection.Then compared with the warning model constructed by Logistic regression(LR)and through internal verification to explore the application value of the model,to offer a simple and feasible method for clinical assessment of the potential risk of sepsis.Methods: Retrospective analysis was performed based on the Medical Information Mart for Intensive Care III(MIMIC-III)database.Extracted the general conditions,vital signs and laboratory examinations of the patients with suspected infection,and selected related factors.According to 0.7: 0.3,separated the patients with suspected infection into training set and validation set randomly.Then based on RF and LR respectively to build an early risk warning model of sepsis for the patients with suspected infection through the training set data.The two models were compared according to the area under the receiver operating characteristic curve(AUC),and to evaluate the warning efficiency of the RF model further.Finally,through the validation set data to verify the RF model internally,and using AUC to evaluate the stability and effectiveness of the RF model.Results: 1)Altogether 2339 patients with suspected infection were enrolled,and they were separated into 1654 in training set and 685 in validation set.And the general conditions,vital signs and laboratory examinations of the two groups showed no significant differences.2)In the importance score of each variable of the RF model,the eight characteristic variables of age,mean arterial pressure,heart rate,hemoglobin,platelets,serum creatinine,blood urea nitrogen and lymphocytes have higher scores and clinical significance,the sensitivity of the model was 65.8%,the specificity was84.1%,the Yoden index was 0.499,the AUC was 0.830,and the 95% CI:0.811-0.848.The influencing factors of the LR model finally incorporated were age,serum creatinine,blood urea nitrogen and C-reactive protein,the sensitivity of the model was 58.2%,the specificity was 59.8%,the Yoden index was 0.180,the AUC was0.620,and the 95% CI:0.596-0.643.The results indicated that the warning efficiency of the random forest model was higher than the Logistic regression model dramatically on whether the patients with suspected infection were progressed to sepsis.3)Through the validation set data to verify the RF model internally,the AUC of the RF model in the validation set was 0.812,which was close to the AUC of the RF model in the training set.It indicated that the RF model of early risk warning for sepsis has higher stability and effectiveness.Conclusions: 1)Compared to LR,RF algorithm has a higher prediction performance on whether the patients with suspected infection progress to sepsis.2)The RF model of risk warning for sepsis has higher stability. |