Objective:In order to construct the Random Forest(RF)algorithm model and the traditional Logistic regression model,to compare the efficacy of the models,to search for the optimal prediction model,and to explore the risk factors influencing the prognosis of Pemphigus vulgaris(PV),and to provide guidance for clinical study.Thus improving the survival rate of patients with PV.Methods:Clinical data of patients admitted to the First Hospital of Shanxi Medical University and diagnosed with PV from April 2015 to April 2022 were retrospectively collected,and1-year mortality was used as a prognostic indicator.Single factor analysis was conducted to preliminarily obtain the risk factors affecting the PV prognosis.After characteristic selection,RF model and classical Logistic regression model were constructed to obtain the ranking of factors affecting the PV prognosis,and the effectiveness of the established prediction models were compared.Results:1.A total of 76 patients were included in this study,and the number of deaths in 1year was 18,with a mortality rate of 23.7%.2.Most patients with PV occur in middle age and are rare in children.3.63% of PV patients had mucosal lesions on admission。 4.In this study,Multiple Imputation(MI)was used to supplement the missing data values.Unifactor analysis showed that rash involvement of mucosa,decreased serum albumin,accelerated ESR,increased CRP,decreased blood calcium,antibiotic use,gamma globulin use,and high severity were associated with poor prognosis of PV.However,univariate analysis failed to take into account the collinearity and miscellaneous problems of regression data,so characteristic selection was carried out.When the number of variables was 5,the mean square error of the model was the smallest,so five variables including whether mucous membrane was involved,albumin,ESR,CRP and disease severity were retained for the subsequent model construction.The data after feature selection was randomly divided into a training set(70%)and a test set(30%).The ranking of risk factors influencing PV prognosis using RF model was as follows:decreased albumin,accelerated ESR,increased CRP,high disease severity,and accumulated lesion mucosa(P<0.05).The traditional Logistic model showed that the risk factors affecting the prognosis of PV were as follows: decreased albumin,accelerated ESR,and high disease severity(P<0.05).5.The accuracy,sensitivity,specificity and F1 values of RF model in the training set were 100%,100%,100% and1.00,respectively.The accuracy,sensitivity,specificity and F1 values in the test set were90.91%,77.78%,100% and 0.875,respectively.The accuracy,sensitivity,specificity and F1 values of the Logistic model training set were 92.59%,100%,91.49% and 0.7778,respectively.The accuracy,sensitivity,specificity and F1 values in the test set were81.82%,66.67%,92.31% and 0.75,respectively.Finally,it was concluded that the prediction efficiency of RF model was good.Conclusion:1.RF prediction model can accurately predict the prognostic risk factors of PV,which is better than Logistic regression model.2.RF model showed that decreased serum albumin,increased ESR,increased CRP,higher disease severity,and rash involvement of mucosa were the risk factors affecting the prognosis of PV.The traditional Logistic model showed that the risk factors affecting the prognosis of PV were ranked as lower serum albumin,faster ESR,and higher disease severity. |