| OBJECTIVE:To construct and validate a copy number alteration-based prognostic model for early breast cancer by machine learning algorithms.METHODS:Three clinical cohorts were included,the METABRIC cohort,TCGABRCA cohort and the CICAMS cohort The METABRIC cohort was split into a training cohort and an internal validation cohort in accordance with 1:1.The training cohort was divided into event and control groups according to whether recurrence,metastasis or death events occurred within 5 years,and genes that were statistically different between the two groups were selected by corrected chi-square test(Gene set 1);genes were screened by matching mutation information(and copy number change information)and mRNA expression information of each gene according to official gene ID,and by whether the two had certain correlation(Gene set 2).The final prognosis-related gene set was the intersection of Gene set 1 and Gene set 2.Patient prognostic risk scores were obtained using LASSO regression and were applied selectively to the METABRIC cohort internal validation cohort,TCGA-BRCA cohort,and CICAMS cohort for subsequent survival analyses.All analyses were performed in R 4.1.0,and survival curves were output by GraphPad Prism 6.0.RESULTS:A total of 2295 patients with early-stage breast cancer were included in the study,1427 from the METABRIC cohort,837 from TCGA-BRCA cohort,and 31 from the CICAMS cohort.A total of 6 prognosis-related genes were included after covariance determination analysis.The prognostic risk of all patients in TCGA-BRCA cohort and CICAMS cohort was obtained by LASSO regression,and patients were divided into two groups:low to normal risk group(LNRG)and high-risk group(HRG).There was a statistically significant difference in RFS between LNRG and HRG for both TCGA-BRCA and CICAMS cohorts.(TCGA-BRCA:HR=1.52,95%CI:1.062.17,P=0.021;CICAMS:HR=5.93,95%CI:1.08-32.72,P=0.041.)Calibration curve and decision curve analysis suggested an excellent fit of the model.CONCLUSION:In this study,a prognostic model for early breast cancer based on somatic cell copy number alteration was developed,which performed well in both the internal validation set and the external validation set including the Chinese population. |