| Breast cancer,as the cancer with the highest incidence of women in the world,has a serious impact on the lives of women around the world.At present,early screening is still the most effective way to control the development of breast cancer.Due to the lack of accurate biomarkers,early diagnosis of breast cancer is still very difficult.Therefore,it is necessary to explore the molecular mechanisms involved in the occurrence and development of breast cancer in order to discover more new candidate genes to improve early diagnosis and treatment decisions.This paper analyzes breast cancer gene expression data to find breast cancer related genes,and selects genes that are significantly related to prognosis as biomarkers for breast cancer;secondly,these genes are used as a whole to construct a prognostic model;finally,the model evaluates the prognosis of breast cancer patients to improve the quality of breast cancer prediction.The main research contents of this paper are as follows:(1)This paper proposes a DO-UNIBIC related gene selection method.In view of the problem that the disease ontology analysis cannot find the potential genes related to breast cancer and the gene expression data cannot be effectively used,this paper proposes to use disease ontology analysis to first select breast cancer related genes from differentially expressed genes in breast cancer,and then use the UNIBIC algorithm Based on the longest common sub-sequence in the expression data,all gene clusters with the same change trend are found.Experiments have shown that there are genes related to breast cancer and potentially related genes in the gene clusters that are intersected by the results of the two algorithms,so that more comprehensive breast cancer-related genes can be screened from the differentially expressed genes of breast cancer for prognostic analysis of candidate genes.(2)This article builds an eight-gene prognostic model to evaluate the prognostic risk of breast cancer patients.In this paper,multi-factor Cox proportional regression was used to analyze the prognosis of the candidate gene set to obtain 8 genes that were significantly related to prognosis.These 8 genes were ACSL1,CD24,EMP1,JPH3,CAKM4,JUN,S100 B,and TP53AIP1.From the results of multifactor Cox regression analysis,we can know that the four genes ACSL1,CD24,EMP1 and JPH3 are risk factors,and their high expression has an adverse effect on prognosis;the four genes CAMM4,JUN,S100 B and TP53AIP1 are protection Factors,their high expression can play a good prognostic role.Among them,ACSL1 is a potential related gene for breast cancer screened by the method in this article.Finally,the eight genes are used as a whole to construct an eight-gene prognostic model,and the overall risk score is used to evaluate the prognosis of breast cancer patients.The results of the study show that the evaluation effect of the model is reliable. |