| Objectives:This study aims to optimize the treatment of sepsis by integrating modern machine learning techniques with the traditional Chinese medicine method of tonifying qi,invigorating blood circulation,and detoxification(Yi-Qi Huo-Xue Jie-Du).Through systematic evaluation and unsupervised clustering analysis of large databases,we identify potential subtypes of sepsis and analyze their characteristics,mapping them to the TCM subtype of Qi deficiency and blood stasis sepsis.We then develop and validate a semi-supervised learning model that combines self-training techniques with a random forest classifier to assess the applicability of TCM treatment methods in managing sepsis.Finally,we compare the efficacy of different treatment regimens through randomized controlled trials to explore the potential of machine learning-assisted clinical decision tools in enhancing the treatment outcomes of sepsis.Methods:1.Qualitative Systematic Review:Research related to sepsis disease subtypes was identified and extracted from databases such as PubMed,Web of Science,and Cochrane,using predefined search terms and inclusion/exclusion criteria.The relevant data were then analyzed.2.Unsupervised Machine Learning Clustering Analysis:Clinical data features of patients diagnosed with sepsis within 24 hours of diagnosis were extracted from the MIMIC-Ⅳ database.After evaluating the distribution and missing data,selected data were imputed and standardized using the K-nearest neighbors method.The optimal number of inherent phenotypes within the dataset was determined using silhouette coefficients,followed by cluster analysis using the k-means algorithm to identify potential sepsis subtypes.Principal component analysis was used to visualize the clustering results,and statistical analysis was conducted on the clinical characteristic differences among the sepsis subtypes.3.Subtype Mapping Analysis:A database of patients with Qi-deficiency and Blood-stasis type sepsis was constructed based on the electronic medical record management system of the Intensive Care Unit at the First Affiliated Hospital of Heilongjiang University of Chinese Medicine.After feature selection and data preprocessing,a supervised learning model based on random forests was used for the subtype prediction and classification of Qi-deficiency and Blood-stasis type sepsis patients,and to identify patients from the MIMIC-Ⅳ database potentially belonging to this subtype.The model utilized data from sepsis patients with known subtypes in the MIMIC-Ⅳdatabase as the training set.Model parameters were optimized through cross-validation,and Shapley additive explanations were used as a tool for interpretative analysis to enhance the model’s accuracy and interpretability.4.Traditional Chinese Medicine Treatment Suitability Analysis:A preliminary semi-supervised learning model based on a random forest classifier was constructed using patient data from the Qi-deficiency and Blood-stasis type sepsis patient database who underwent Qi-boo sting,Blood-invigorating,and detoxifying treatments.The model utilized clinical data of patients at the initiation points of these treatments as the training set,with uniform category labels.The primary model was used to identify clinical characteristic initiation points similar to those of patients treated with the Qi-boo sting,Blood-invigorating,and detoxifying methods within the MIMIC-Ⅳ database,including those potentially belonging to the Qi-deficiency and Blood-stasis type sepsis undergoing antibiotic treatment,mechanical ventilation,and continuous renal replacement therapy.High-confidence predictions were incorporated into the training set through self-training techniques,thus entering an iterative optimization phase of the model.Cosine similarity was used to assess the similarity between data points,with 0.7 set as the similarity threshold;patient data exceeding this threshold were considered similar.All numerical features were standardized using Standard Scaler,and Synthetic Minority Oversampling Technique(SMOTE)was used to address data imbalance issues.After data preprocessing and balancing,the random forest algorithm was used for model training and optimization.The model’s performance was evaluated based on precision and recall,with its diagnostic accuracy displayed through ROC curves and AUC values,and predictive details in practical applications revealed by confusion matrices.Influential clinical features on the model’s predictive capability were identified based on the contribution to the decrease in Gini impurity within the decision trees.Results:1.Qualitative Systematic Review:Out of an initial screening of 7,598 papers,39 were included in the analysis based on predefined criteria.The distribution of publication years was as follows:2 each in 2016 and 2018,5 each in 2019 and 2020,8 in 2021,15 in 2022,and 2 in 2023.Studies were categorized by sample size into:7 studies with no more than 250,5 studies with 251 to 500,1 study with 501 to 1000,5 studies with 1001 to 2000,8 studies with 2001 to 5000,7 studies with 5001 to 10000,2 studies with 10001 to 20000,and 4 studies with over 20001.Studies were divided into two major categories based on sample size:less than 1000 and 1000 or greater.For studies with sample sizes less than 1000,LCA was used 3 times,HC twice,and HMM,KMC,LPA once each.In studies with sample sizes of 1000 or greater,KMC was applied 8 times,LCA 3 times,HC and LPA twice each,and other machine learning methods once each.The number of subtypes identified ranged from 2 to 20,with most studies focusing on 3 to 5 subtypes.Uns supervised clustering analysis involved clinical features divided into 9 major categories,including basic and clinical characteristics,laboratory test indicators,vital signs and important physiological indicators,and intensive care scoring,among others."Basic and clinical characteristics" were mentioned in 12 studies,with a usage rate of 61.5%;"Vital signs and important physiological indicators" were mentioned in 16 studies,with a usage rate of 76.9%.Traditional Chinese Medicine syndromes were less frequently mentioned,involved in only one study,with a usage rate of 3.8%.2.Unsupervised Machine Learning Clustering Analysis:Patients with sepsis in the MIMIC-Ⅳ database were clustered into four subtypes:Subtype 1,Subtype 2,Subtype 3,and Subtype 4.Subtype 1 patients exhibited a low mortality rate,short hospital and ICU stays,and mild organ dysfunction,associated with better oxygen saturation,stable lactate levels,and a mild comorbidity burden.Subtype 2 patients were characterized by a high mortality rate,increased SOFA scores,and prolonged ICU stays,accompanied by severe multi-organ dysfunction and high comorbidity risk,significant lactate elevation,and decreased pH levels.Subtype 3 patients had a mortality rate between that of Subtypes 1 and 2,the highest average age,corresponding to a higher Charlson comorbidity index.Subtype 4 patients,while similar to Subtype 1 in terms of SOFA and GCS scores,indicating milder organ dysfunction,had a mortality rate close to Subtype 3.Coagulation function indicators and inflammation markers in Subtype 4 showed moderate abnormalities.3.Subtype Mapping Analysis:A supervised learning model based on the random forest algorithm was developed,demonstrating an accuracy of 87.10%,precision of 87.18%,and recall of 87.10%on the sepsis patient test set within the MIMIC-IV database.Feature importance analysis identified key biomarkers impacting subtype classification,including the minimum and maximum values of serum creatinine and average heart rate.Applied to the Qi-deficiency and Blood-stasis type sepsis patient dataset for subtype prediction,the results showed that 97 patients were classified into Subtype 1,representing 39.4%of the dataset;3 patients into Subtype 2,representing 1.2%;and the numbers of patients for Subtypes 3 and 4 were 72 and 74,respectively,accounting for 29.3%and 30.1%of the dataset.4.Traditional Chinese Medicine Treatment Suitability Analysis:A semi-supervised learning model combining self-training techniques with a random forest classifier demonstrated high efficiency in predicting whether patients are suitable for Qi-boosting,Blood-invigorating,and detoxifying herbal treatment.For identifying patients unsuitable for this treatment,the model achieved precision,recall,and F1 scores of 89%,98%,and 93%,respectively.In contrast,for determining patients suitable for treatment,the respective metrics were 89%,58%,and 70%,with an overall accuracy rate of 88.87%,indicating good overall model performance.The confusion matrix revealed the model’s real predictive performance,successfully identifying 6274 true negatives and 1097 true positives,while producing 135 false positives and 788 false negatives,demonstrating higher accuracy in determining unsuitability for treatment.The area under the ROC curve(AUC)was 0.93,reflecting the model’s high accuracy in diagnosing patient suitability for Qi-boosting,Blood-invigorating,and detoxifying treatment.Feature importance analysis identified the minimum value of PT,the maximum value of PTT,and the maximum value of BUN as key predictive factors for treatment suitability.5.Clinical Study:Before treatment,this study compared the SOFA scores,GCS scores,Traditional Chinese Medicine(TCM)syndrome scores,and serum lactate,pH values,WBC,ANC,NLR,PT,PTT among three groups:TCM-DTG,ML-ATCMG,and CWMTG,finding no statistical difference between the groups(P>0.05).After treatment,the TCM syndrome scores in the TCM-DTG and ML-ATCMG groups were significantly reduced compared to the CWMTG group(P<0.05),with the effectiveness rates of TCM syndrome scores being 93.02%,80.00%,and 43.90%respectively,indicating a statistically significant difference(P<0.05).Post-treatment,the TCM-DTG and ML-ATCMG groups showed significant improvement over the CWMTG group in SOFA scores,serum lactate,NLR,PT,PTT(P<0.05),while there was no statistical significance in GCS scores,pH values,WBC,ANC among the three groups(P>0.05).In the comparison of indicators between TCM-DTG and ML-ATCMG,although ML-ATCMG performed better overall than TCM-DTG,the variability of ML-ATCMG’s data was greater than that of TCM-DTG,indicating larger differences in treatment outcomes among individuals,and thus less stability in the therapeutic effects of ML-ATCMG compared to TCM-DTG.Conclusion:1.In the analysis of sepsis subtypes within the MIMIC-Ⅳ database,the applicable unsupervised clustering algorithm is KMC,with the optimal preset range for the number of clusters(K)being 3 to 7.The multimodal clinical diagnostic parameters used for clustering analysis primarily include three major categories of data:"Laboratory Test Indicators," "Basic and Clinical Characteristics," and "Vital Signs and Important Physiological Indicators."2.Analysis of the MIMIC-Ⅳ database revealed four potential sepsis subtypes:Subtype 1-Primary Adaptive Sepsis Subtype;Subtype 2Severe Progressive Sepsis Subtype;Subtype 3-Complex Aging Sepsis Subtype;and Subtype 4-Complex Dynamic Sepsis Subtype.3.The subtype distribution among patients with Qi-deficiency and Blood-stasis type sepsis mainly encompasses patients with sepsis subtypes 1,3,and 4,indicating that aside from subtype 2,patients with sepsis in the MIMIC-Ⅳ database may exhibit characteristics of Qi-deficiency and Blood-stasis.4.The Traditional Chinese Medicine(TCM)auxiliary decision-making tool,developed using a semi-supervised learning model,demonstrated excellent performance in assessing the suitability of Qi-boo sting,Blood-invigorating,and detoxifying TCM treatment plans.5.Capsule No.2 formula showed significant effects in reducing SOFA scores,significantly improving tissue perfusion and coagulation function,and regulating immune function in patients with Qi-deficiency and Blood-stasis type sepsis.The clinical application of Capsule No.2 formula,guided by the TCM auxiliary decision-making tool,effectively enhanced the overall therapeutic efficacy. |