| Background and objective:Breast cancer is the most common malignant tumor in women all over the world,and its incidence is increasing year by year.There are a large number of immune cell infiltration in the tumor microenvironment(TME)of breast cancer.As an important part of TME,B cells play a key role in the occurrence,development and anti-tumor therapy response of breast cancer.In this study,through the analysis of single cell sequencing data in GEO database,breast cancer B cell characteristic genes were identified and related prognostic risk models were constructed,hoping to contribute to the accurate treatment of breast cancer patients,and help to evaluate the-survival prognosis of breast cancer patients and guide clinical decision-making.Methods:The gene expression profiles and clinical data of patients in TCGA-BRCA cohort were downloaded from TCGA website.The level of B infiltration in.each sample was evaluated by EPIC,MCPcounter and X-cell deconvolution analysis tools.Weighted coexpression network analysis(WGCNA)was used to identify the core genes related to the level of B cell infiltration and the corresponding GO and KEGG enrichment analysis.Download single cell RNA sequencing data set GSE161529 from GEO database to analyze and identify B cell characteristic genes.Univariate COX regression analysis and LASSO-COX multivariate regression analysis were used to identify independent prognostic genes and construct breast cancer B cell related risk model.Download the three data sets of GSE96058,GSE39004 and GSE20685 in the GEO database to verify the risk scoring model externally.The correlation between B cell risk score and clinicopathological features of breast cancer was analyzed by Wilcox.test test.SsGSEA algorithm was used to evaluate the level of immune cell infiltration in different risk groups,and TIDE algorithm was used to predict the response of patients to immunotherapy.The expression levels of immune checkpoints in different risk groups were compared by inter-group sample T-test.OncoPredict package was used to analyze drug sensitivity and evaluate the correlation between risk score and drug sensitivity.The maftools package was used to calculate the tumor mutation burden(TMB).Different risk groups were analyzed by GO,KEGG and GSEA enrichment analysis."Rms" package was used to construct a line chart prognostic model based on B cell risk score,and the model was verified internally and externally by C index,ROC curve,calibration curve and DCA clinical decision curve.Results:The results of EPIC,MCPcounter and X-cell deconvolution algorithms all showed that the breast cancer patients with high level of B cell infiltration had a better prognosis.WGCNA analysis identified 179 core genes closely related to the level of B cell infiltration.GO and KEGG enrichment analysis showed that these genes were significantly enriched in immune-related signaling pathways.Then,73 characteristic genes of B cells were obtained by analyzing the breast cancer single cell sequencing data set GSE161529.Univariate COX regression analysis showed that 24 B-cell characteristic genes were associated with the prognosis of breast cancer patients.Five independent prognostic genes(JCHAIN,CD52,-RAC2,BTG1,EZR)were identified by LASSO-COX multivariate regression analysis.Among them,EZR is a risk gene(HR=1.42),and the other four are protective genes(HR<1).The breast cancer B cell related risk score(BCMG)was constructed based on these five independent prognostic genes.The formula is:BCMG=(0.075 × JCHAIN expression)+(-0.018 × CD52 expression)+(--0.167 × RAC2 expression)+(-0.078 × BTG1 expression)+(0.169 × EZR expression).In TCGA and three external validation cohorts(GSE96058,GSE39004,GSE20685),the prognosis of patients at high risk was significantly worse(p<0.05).The area under ROC curve shows that BCMG score has a good ability to predict prognosis.The results of clinical correlation analysis showed that the BCMG scores of patients with advanced age(>65 years old),patients with higher T stage and patients with Luminal B molecular subtype were relatively higher.Compared with the low-risk group,the expression level of multiple immune checkpoints was relatively lower in the high-risk group.The infiltration level of immune cells in high-risk group and low-risk group was estimated by ssGSEA algorithm.The infiltration level of many kinds of immune cells,including activated B cells,CD4+T cells,CD8+T cells and natural killer cells,was higher in the high-risk group,and the infiltration level of activated dendritic cells was higher in the low-risk group,but there was no significant difference in macrophages between the two groups.Patients in the high-risk group had relatively lower TIDE scores,suggesting that immunotherapy might be better for these patients."OncoPredict" package was used to predict the sensitivity of many kinds of common chemotherapy and endocrine drugs for breast cancer.It was found that drug sensitivity was positively correlated with BCMG score.Compared with the low-risk group,the high-risk group had a higher tumor mutation load(p=0.049).Age,TNM stage,molecular subtype and BCMG ’score were included in multivariate analysis.It was found that age,N stage and BCMG score were independent prognostic factors for breast cancer patients.Based on this,a Nomogram line chart prediction model was constructed to predict the 1-,3-and 5-year overall survival rate of breast cancer patients.Through the verification of the constructed Nomogram model,it is found that the model shows good differentiation and accuracy in both internal and external verification queues.Conclusion:The level of B cell infiltration in breast cancer is significantly related to the prognosis of patients.In this study,B cell marker genes were identified by analyzing single cell sequencing data,and a related risk score model was constructed,which can accurately predict the prognosis of breast cancer patients and help clinical diagnosis and treatment.and provide a basis for individualized treatment of breast cancer patients. |