| ObjectiveIn this study,inpatients with generalized anxiety disorder(GAD)and panic disorder(PD)were selected as subjects,and a database of anxiety disorders was constructed based on socio-demographic,clinical characteristics and biological indicators.The random forest(RF)approach is used to explore clinical and biological markers to predict the tendency to chronicity of anxiety disorders,provide simple and feasible predictors for early clinical identification,and provide reference for personalized precision medicine of GAD and PD.Methods1.Construction of a database of anxiety disorders:Clinical data of 1103 inpatients with GAD or PD who were admitted to the Department of Psychiatry of the First Affiliated Hospital of Zhengzhou University from May 2014 to May 2021 were retrospectively collected to establish a database involving socio-demographic,clinical characteristics and biological indicators.2.Statistical analyses:General data analyses were conducted by SPSS statistical software(version 26.0)and the statistical significance level(two tailed)was set at P<0.05.The prediction model was constructed based on the Balance Random Forest classifier utilizing Python programming language(version 3.7.6).The 10-times-repeated-10-fold cross-validation method was used to assess the fitting performance of the model,and the importance ranking of variables was obtained to screen the best prediction variables.Results1.Screening of patients with chronic anxiety disordersAccording to the inclusion and exclusion criteria,1121 inpatients with GAD or PD were included in total and the distribution characteristics of disease duration were characterized by non-normal distribution.Only 18 inpatients with a disease duration of 1 to 2 years were excluded.A total of 429 cases with GAD or PD for at least two years without remission were defined as the chronic group(GAD/PD-C),and 674 cases with disease duration no more than one year were assigned to the non-chronic group(GAD/PD-NC),and a total of 1103 cases were eventually included.2.Comparison of general and clinical information between the two groupsComparing the socio-demographic and clinical characteristics between 429 cases with GAD/PD-C and 674 cases with GAD/PD-NC,it was found that there were statistically significant differences in gender,age,subtypes of disease,age of onset,presence of inducement,premorbid personality tendencies,and systematic drug therapy as well as family history of psychiatric disorders(P<0.05).There was no significant difference in other characteristics between the two groups(P>0.05).Comparing the biological indicators between the GAD/PD-C group and the GAD/PD-NC group,we found that the differences of CRP,NLR,ACTH 16:00,Cor16:00,TT4,FT4,PRL,E2,PROG,TESTO,Urea,T-CHO,TG,APOB,LDL and ACE levels were statistically significant(P<0.05).There was no significant difference in the remaining indicators(P>0.05).3.Constructing RF prediction models and calculating variable importanceAll clinical data were divided into three domains including socio-demographic,clinical characteristics,and biological indicators(i.e.,inflammatory,endocrine and metabolic levels).Only variables with missing values of no more than 30%were included,and multiple interpolation was used to obtain the complete data set,with a total of 50 prediction variables eventually involved.The AUC value,accuracy,sensitivity,specificity,positive predictive value(PPV)and negative predictive value(NPV)of each domain were calculated based on the confusion matrix.And AUC values were used to measure the performance of prediction models.First of all,when subtypes of disease(GAD and PD)were included as one of the predictors,it was found that the contribution of almost all clinical variables was lower than that of biological variables except the age of onset and age by ranking the importance of variables for the RF prediction model of GAD/PD-C.We also found that the AUC value in the domain of clinical features outperformed that in the domain of biological indicators,while the AUC value in the domain of socio-demographic characteristics was the lowest.The combination of all domains was better than any domain,with AUC value of 0.65,accuracy of 65%,sensitivity of 63%,specificity of 66%,PPV of 0.55 and NPV of 0.74.The subtypes GAD and PD were also analyzed.We found that by ordering the importance of variables for the RF prediction model of GAD-C,the contribution of age of onset is the greatest.The study also found that the AUC value in the domain of clinical features slightly outperformed that in the domain of biological indicators.The combination of all domains was better than any domain,with AUC value of 0.65,accuracy of 65%,sensitivity of 65%,specificity of 66%,PPV of 0.58 and NPV of 0.73.For the RF prediction model of PD-C,it was found that the contribution of the FT4 level was the largest by ranking the importance of its variables.This study also found that the AUC value of inflammation and metabolic level were slightly dominant in the biological domain,followed by the domain of clinical characteristics.However,the combination of all domains was only with an AUC value of 0.57,accuracy of 57%,sensitivity of 57%,specificity of 57%,PPV of 0.41 and NPV of 0.72.4.Optimizing RF prediction models and screening the best prediction variablesAlthough a variety of variables were involved,the prediction accuracy of the initial comprehensive model of GAD/PD-C was only 65%.This study attempted to gradually exclude the least important variables according to the importance of variables in order to optimize the prediction models.For the prediction models of GAD/PD-C,it began to show favorable predictive value when the first 12 predictive variables(i.e.,age of onset,age,FT4,TESTO,HCY,PRL,ACE,UA,TSH,PLT,PROG,and NLR)were included in order.The AUC value was 0.72(>0.70),and the accuracy,sensitivity and specificity were 72%,68%and 75%(all>60%)respectively.When the first two predictive variables(i.e.,age of onset and age)were included,the prediction model had the greatest predictive value,with an AUC value of 0.97,97%accuracy,97%sensitivity,97%specificity,0.95 PPV and 0.98 NPV.For the prediction models of GAD-C,it began to display favorable predictive value when the first 14 predictive variables(i.e.,age of onset,age,PRL,HCY,FT4,TESTO,CRP,LH,ACE,Urea,NLR,UA,TSH,and PROG)were included in order.The AUC value was 0.71(>0.70),and the accuracy,sensitivity and specificity were 71%,70%and 72%(all>60%)respectively.When the first two predictive variables(i.e.,age of onset and age)were included,the prediction model had the greatest predictive value,with an AUC value of 0.96,96%accuracy,94%sensitivity and 97%specificity,0.96 PPV and 0.96 NPV.For the prediction models of PD-C,no good prediction model was found by comparing the parameters of the optimized models.Comparatively speaking,when the first 11 predictive variables(i.e.,FT4,SOD-1,TESYO,CREA,Mono,age of onset,PLT,TG,CRP,ACE,and UA)were included in sequence.The AUC value was the highest,only 0.61,and the accuracy,sensitivity and specificity were 62%,57%and 64%respectively.Conclusions1.Earlier age of onset and older age may be significant predictors of the chronicity of GAD or PD.2.The inclusion of biological factors improves the prediction accuracy of the comprehensive model to a certain extent,implying that immune,endocrine and metabolic disorders may affect the trajectories of the chronicity of GAD or PD.And the low thyroid hormone may be more sensitive to predict the chronicity.3.Compared with GAD,the prediction models’ value of the chronicity of PD is lower,indicating that its mechanism may be more complex and multi-cause. |