| Purpose: Prostate cancer(PCa)has become one of the most common malignant tumors in men,especially in middle-aged men.PCa is not only the main cancer disease in men,it accounts for about 10% of male cancer diseases,and it has a higher mortality rate.Therefore,this study is based on bioinformatics analysis methods to screen out key genes related to the prognosis of PCa and establish a risk regression model,which is of great significance.Method: Select the transcription profile and clinical data set of prostate cancer patients(PCA)from the Cancer Genome Atlas Database(TCGA),and use R(3.6.2.Windows 64-bit system version)software to perform differential analysis of the standardized data,mainly using To the Edger package,the criteria for screening differential genes are: p <0.05 and |log2FC|>1.Use R language to perform functional annotation(GO)and pathway enrichment analysis(KEGG)on differential genes.Use GSEA software to introduce differential genes into gene set enrichment analysis,and use normalized p value(p < 0.05)as the screening criteria to screen out gene sets that can be further studied,and select cell cycle-related pathways Proceed to the next analysis.Through Cox single factor and multivariate regression analysis,the shutdown genes related to the prognosis of PCa were screened out and a risk regression model was constructed.According to the median risk score,the patients were divided into high-risk groups and low-risk groups for survival analysis,and KM curve and ROC curve were drawn to verify the accuracy of the model.For further verification,the PCa patients in the TCGA database were randomly divided into two groups,the training set and the test set,and survival analysis was performed again in the two groups.Results: 1.The relevant transcription profiles and clinical data of 498 prostate cancer patients and 52 normal samples were obtained from the TCGA database.After standardization and preprocessing,the difference analysis was performed using the R language,according to the screening conditions p < 0.05,and | log2FC|>1 yielded 2963 differential genes,of which 1631 were up-regulated genes and 1332 were down-regulated genes.2.Use R language to perform functional annotation and pathway enrichment analysis of differential genes.The enrichment results show that PCa-related differential genes mainly exert passive transmembrane transport protein activity,channel activity,substrate-specific channel activity,and metal ion transmembrane Molecular functions such as transporter activity and ion channel activity are mainly involved in the pattern specification process,muscle system process,membrane potential regulation,signal release,axonogenesis and other biological processes.The products of differential genes are mainly involved in synaptic membrane and presynaptic Nerve cell body,collagen extracellular matrix,cell top part and other cell components.Through KEGG analysis,it can be concluded that the significantly enriched KEGG pathways are neural active ligand-receptor interaction,calcium signaling pathway,and c AMP signaling pathway.3.Use GSEA software to perform gene set enrichment analysis,and get 12 gene sets with p < 0.05.The E2 F and G2 M pathways related to the cell cycle were selected for further research.Among them,E2F(p = 0)contains 36 genes,and G2M(p = 0)contains 39 genes.4.Through univariate Cox regression analysis,we obtained 21 genes related to prognosis with p < 0.05,and further through multivariate Cox regression analysis,we finally determined 3 prognostic-related genes: PRC1,E2F2,KIF18 B,and established a risk regression model: Risk score=1.3807*PRC1expression+0.9756*E2F2 expression-0.9704*KIF18Bexpression.5.The 550 patients in this study were divided into groups according to the median risk score,the high-risk group(n=249)and the low-risk group(n=249).The Kaplan-Meier overall survival curves of the two groups were significantly different,namely p = 0.00735,using the area under the ROC curve that is the AUC area to verify the Cox regression model,AUC=0.726.The AUC of the training set and the test set are also greater than 0.7,which further illustrates the sensitivity and accuracy of the model.Conclusion: In this study,we selected a three-gene(PRC1,E2F2 and KIF18B)risk model by analyzing the prostate cancer data in the TCGA database,and verified the model to ensure the accuracy and credibility of the model.It is suggested that the risk model can predict the survival of PCa patients,and patients with high risk scores have a poor prognosis. |