| Background and Objective The ultimate purpose of comprehensive treatment of breast cancer is to improve the prognosis of patients.Tumor morphological features and pathological stages are often used to judge the prognosis of breast cancer patients.Common indicators include: tumor size,lymph node metastasis,histological grade and clinicopathological stage and so on.With the continuous progress of molecular biology technology,the molecular typing of breast tumors has greatly improved the effectiveness and accuracy of clinical diagnosis,treatment and prognosis evaluation.Tumor mutation burden(TMB)is a concept of quantifying the accumulation of somatic mutations,which has different prognostic value in many types of tumors.High TMB was associated with shorter overall survival(OS)in bladder,gastric adenocarcinoma and endometrial carcinoma,while longer OS in head and neck squamous cell carcinoma,renal clear cell carcinoma,resectable non-small cell lung cancer and melanoma.However,there are few reports on the prognosis of breast cancer with TMB.At present,there are still several restrictions on the detection of tumor TMB in clinical application.First,the technical cost of detection is high,and it is difficult to popularize it in clinical practice.Second,different platforms cover different genes,so theycannot be verified by previous clinical data.Third,there is no unified standard for the grading of TMB in breast tumors.In view of this,the main purpose of this study is to explore whether TMB affects the prognosis of breast cancer patients and to establish a prognosis model of breast cancer patients based on TMB.Specifically,in the first part of this study,through a comprehensive analysis of breast cancer mutation data in TCGA database,we explored the prognostic value of different TMB groups for breast cancer patients,and further tried to find different genes in different TMB groups from the level of gene transcriptome,screened and established breast cancer prognosis-related models and verify them in multiple independent data sets.Methods In this study,we mainly used the mutation and transcription data of breast cancer samples from the TCGA public database.First of all,the "Maftools" package in R language was used to count,analyze and visualize the mutation data of TCGA breast cancer samples.Then the breast cancer patients with different mutation frequencies were divided into groups according to the quartile and median of mutation frequency.K-M survival analysis was performed to explore the effect of TMB on OS in breast cancer patients.The breast cancer patients with the first 25% mutation frequency were defined as the high TMB group,and the last 25% were defined as the low mutation load group.The "Limma" package was used to search for differential genes.The screening condition was log FC > | 1 |,FDR < 0.05.1006 breast cancer patients in TCGA database were randomly divided into TCGA training set and test set at 5:5.The train set was used to established the model,and the test set verify.Univariate Cox regression analysis was used to screen the differential genes that were significant for OS in breast cancer. Univariate Cox regression analysis was used to screen differential genes with significance for OS.Then multivariate Cox regression analysis was used to establish the mutation prognosis model(MPM)with the lowest AIC value by using the “forward step-by-step screening method”.The MPM score of each breast cancer patient could be calculated through the "predict" function in the "survminer" package.According to the median MPM score,breast cancer patients in TCGA train set were divided into high MPM group and low MPM group.The expression profile and clinical feature information of the independent breast cancer dataset were downloaded from the GEO database.The verification data set was corrected by the "voom" function in the "limma" package and the expression profile of the TCGA transcriptome.Then,the independent data sets were grouped according to the median value of the MPM score.K-M analysis was used to evaluate OS and disease-free survival of breast cancer patients in MPM groups.The area of AUC under ROC curve was used to evaluate the accuracy of MPM score in predicting prognosis of 1,3,5 years.The comparison of p CR rate of neoadjuvant chemotherapy among MPM groups was tested by X2.Finally,the "GSVA" package in R language was used to analyze the tumor microenvironment based on the transcriptional expression profiles of TCGA breast tumor samples,including the enrichment scores of immune infiltrating cells and immune-related signaling pathways,which were used to compare the differences of tumor microenvironment among different MPM groups.Result The results show that whether breast cancer patients were grouped according to the quartile of mutation frequency or the median of mutation frequency,the prognosis of the high mutation frequency group is worse than that of the low mutation frequency group.The difference of expression between 219 high mutation group and 226 low mutation group was analyzed,and 188 differential genes in high mutation group were obtained.By univariate Cox regression analysis,24 genes were found to be associated with OS in breast cancer patients.Multivariate Cox regression analysis showed that 5 genes with minimum AIC value of 667.95 were selected to establish MPM whose C index was 0.746(Se=0.026).The minimum value of MPM score in TCGA train set was 0.106 and the maximum 14.309.According to the median score(1.00),breast cancer patients in TCGA train set were divided into high MPM group and low MPM group.The results of K-M survival analysis showed that there was a difference in survival prognosis between high and low MPM groups in all the data sets used in this study.The breast cancer patients with high MPM had shorter OS or time of recurrence and metastasis,and the prognosis was worse.The AUC values of MPM score for predicting 1-,3-and 5-year survival rate or recurrence and metastasis rate of breast cancer patients were all about 0.7.In addition,in this study,a total of 841 breast cancer patients completed the whole cycle of neoadjuvant chemotherapy in the three data sets.The results showed that there were 182(21.6%)patients with p CR in neoadjuvant chemotherapy,397 patients in high MPM group and 444 patients in low MPM group.A total of 119 patients with p CR in the high MPM group,accounting for30.0%,and 63 in the low MPM group,accounting for 14.2%.The p CR rate after neoadjuvant chemotherapy for breast cancer increased significantly in the high MPM group.The enrichment score of tumor microenvironment by ss GSEA and the difference of protein transcriptional expression in multiple immune checkpoints were evaluated.We found that high MPM group had higher activation of immune-related signaling pathways than low MPM group,but the enrichment of related signaling pathways that inhibit immune response and the expression of several key immune checkpoint proteins(such as CTLA4,PD1 and PD-L1)also increased,indicating that immunosuppression may exist at the same time.Finally,we found that the expression level of CTLA4 in patients with neoadjuvant chemotherapy p CR before treatment was higher than that in non-p CR patients after neoadjuvant chemotherapy.Conclusion1.For breast cancer patients,the patients with high TMB had a shorter OS.2.The MPM,established based on TMB has good applicability,stability and accuracy in predicting the prognosis of breast cancer patients.3.The possibility of neoadjuvant chemotherapy p CR was enhanced in the high MPM group. |