| Objectives:Systemic Lupus Erythematosus(SLE)is an autoimmune disease characterized by chronic inflammation and multi-organ damage.Research has shown a close correlation between gut microbiota and the occurrence of SLE.In-depth research on the relationship between gut microbiota imbalance and SLE has significant scientific and clinical implications for the prevention and treatment of SLE.The aim of this study is:(1)to identify important gut microbiota of SLE patients through a machine learning feature selection approach by comparing them with a Negative Control(NC)group;(2)to achieve early identification of SLE patients by combining machine learning algorithms with important gut microbiota;(3)to use machine learning and explainability techniques to visualize the predictive factor mechanisms,providing a reference for future scientific treatment of SLE.Methods:It’s a survey study,and we collected 70 primary SLE patients from a tertiary hospital in Shanxi province between December 2018 and August 2019 as the research subjects.All patients met the classification criteria for SLE revised by the American College of Rheumatology in 1997.Meanwhile,71 healthy individuals matched in age and gender with the study subjects were included as the NC group.Basic information and fecal samples were collected from all participants.In addition,peripheral venous blood was collected from SLE patients,and cytokines and T cell subtypes were detected using flow cytometry.16 S rRNA sequencing technology was used to sequence the fecal samples.After obtaining the sequencing data,species-level microbial data was obtained through quality control,clustering,and species annotation.Then,the alpha diversity and beta diversity of the microbial communities in the SLE group and the NC group were analyzed.Next,we used Elastic Net(EN)and Boruta to select important species-level gut microbiota in the SLE group,and the intersection of the selected results was considered the important gut microbiota for SLE.On this basis,Spearman correlation analysis was performed to explore the relationship between the important gut microbiota and cytokines and immune cells in SLE patients.In addition,six machine learning algorithms,including Logistic Regression(LR),Least Absolute Shrinkage and Selection Operator(LASSO),Classification and Regression Tree(CART),Random Forest(RF),Adaptive Boosting(Ada Boost),and Extreme Gradient Boosting(XGBoost),were used to model the important gut microbiota for SLE to investigate whether they could be used to identify high-risk individuals for SLE.The performance of these models was evaluated using accuracy,sensitivity,specificity,positive predictive value,negative predictive value,and Decision Curve Analysis(DCA).Finally,the best-performing algorithm was combined with the SHAP interpretable framework to explore how the important gut microbiota for SLE affects the risk of developing the disease.Results:(1)The alpha diversity indices,including Chao1,Richness,Sobs,and Shannon,were significantly lower in the SLE group than in the NC group(P < 0.05).Both principal component analysis and non-metric multidimensional scaling analysis showed significant differences in beta diversity between the two groups(P < 0.05).(2)After feature selection,35 and 28 microbial taxa were retained in the EN and Boruta methods,respectively,with 15 overlapping taxa,which were considered as important gut mocrobiota in SLE,including Faecalibacterium_prausnitzii,Ruminococcus_bromii,Dialister_succitiphilus,Clostridium_aldenense,Escherichia_fergusonii,Phascolarctobacterium_succitutens,Bacteroides_fragilis,Eubacterium_eligens,Gemmiger_formicilis,Alistipes_shahii,Eubacterium_hallii,Clostridium_asparagiforme,Roseburia_inulinivorans,Roseburia_intestilis,Blautia_wexlerae。(3)Spearman correlation analysis showed that Bacteroides_fragilis was positively correlated with IFN-α(P <0.05);Blautia_wexlerae was positively correlated with NK cells(P < 0.05);Clostridium_aldenense was negatively correlated with Th2/Treg(P <0.05);Dialister_succitiphilus was negatively correlated with IL-17(P < 0.05);Escherichia_fergusonii was positively correlated with TBNK cells and Th cells(P <0.05),and negatively correlated with IFN-γ and IL-6(P < 0.05);Eubacterium_eligens was negatively correlated with Treg,Th2,and Th1(P < 0.05);Eubacterium_hallii was positively correlated with Th17 and negatively correlated with Th1/Treg(P < 0.05);Phascolarctobacterium_succitutens was positively correlated with IL-2(P < 0.05).(4)Six machine learning algorithms combined with important gut microbes in SLE were effective in identifying SLE patients,especially the XGBoost algorithm,with Accuracy,Sensitivity,Specificity,Positive Predictive Value,and AUC reaching 0.905,0.857,0.952,0.947,and 0.905,respectively.The values of the above indicators in RF were 0.881,0.810,0.952,0.944,and 0.881.The performance of LASSO,LR,and Ada Boost models was moderate,while the performance of the CART algorithm was slightly lower than that of other algorithms,reaching 0.786,0.810,0.762,0.773,0.800,and 0.786,respectively.Additionally,DCA showed that the XGBoost model could bring the greatest clinical benefit to patients,followed by RF.(5)The XGBoost algorithm combined with the SHAP interpretability framework showed that there was a complex non-linear relationship between the relative abundance expression levels of important gut microbiota in SLE and the risk of SLE.Among them,the SHAP value of Roseburia_intestilis was the highest,indicating that as the relative expression level of this bacterium increased,the risk of SLE decreased.Conclusions:(1)There are significant differences in the gut microbiota between the SLE group and the NC group.Faecalibacterium prausnitzii,Eubacterium eligens,and Roseburia intestilis have much higher abundance in the NC group than in the SLE group.On the other hand,Escherichia fergusonii and Bacteroides fragilis have higher abundance in the SLE group than in the NC group,indicating dysbiosis of gut microbiota in SLE patients.(2)Machine learning classifiers combined with important gut microbiota in SLE patients can effectively identify SLE patients in the early stages,providing valuable reference for clinical decision-making,especially XGBoost and RF.(3)The SHAP interpretability framework helps to explain the size and direction of the role of important gut microbiota in the SLE identification process,and can provide reference for future improvement of scientific treatment of SLE. |