| In modern society,the incidence of cancer diagnosis among different groups of people is increasing.Due to the complex working mechanism of cancer,the recurrence of advanced disease greatly reduces the survival rate of patients,which threatens the safety of human life.Previous researchers focused on the selection of differentially expressed genes in colorectal cancer(CRC)microarray data without elaborating on the interaction between gene networks and the association between co-expression modules and phenotypic characteristics,and the traditional selection rules of co-expression network threshold would lose part of data information,however,the weighted co-expression network algorithm(WGCNA)can retain more information by selecting the optimal soft threshold under the premise of meeting the requirements of scale-free network construction.Because the number of samples and clinical information in chip data is too small,while the TCGA database has a large number of samples and phenotypic data,which is more conducive to the study of important modules related to phenotypes and the identification of core genes.In this research,the weighted co-expression network analysis was performed on colorectal cancer data which were collected from GEO database and TCGA database,from the perspective of gene classification,the core modules related to colorectal cancer phenotype were excavated,and then the core genes of colorectal cancer were identified and comprehensively verified by bioinformatics method,finally,we regarded the selected core genes as the drug therapeutic targets for colorectal cancer.Based on the data mining of colorectal cancer,the main research contents of this thesis are as follows:(1)In this article,a set of microarray data from the GEO database for colorectal cancer was studied.First of all,genes whose variance rank top 25% are analyzed by the weighted coexpression network and 10 modules are obtained.Two modules highly related with CRC are selected with p<0.05.DAVID database is used for GO and KEGG enrichment analysis.The results suggest that the genes in these two modules mainly enrich in the processes of Cell regulation,Extracellular exosome,Metabolic pathways,as well as in the processes of Cell division,Protein binding,Cell cycle and P53 signaling pathway.Then,the genes in the two core modules are visualized by Cytoscape software and the core genes with p<0.05 are selected based on the survival analysis of the GEPIA database.The five core genes including PPARG,ACO2,MYC,CCNB2 and NUP37 are eventually obtained as possible biomarkers for colorectal cancer,finally,GEPIA database,HPA database and literature were combined to verify the clinical expression of prognosis.It is beneficial to advance the research process of drug targets for colorectal cancer.(2)In this research,a weighted co-expression network analysis was performed on colorectal cancer data from the TCGA database,with the aim of screening out the important modules related with its race and predicting the core genes associated with colorectal cancer.Firstly,a series of pretreatments were performed on colorectal cancer data in the TCGA database,the colorectal cancer expression profile matrix consisting of 107 tumor samples and19565 genes,and the clinical information matrix consisting of 107 tumor samples and 9phenotypic characteristics were obtained.In order to reduce the amount of calculation and eliminate the influence of background noise,we selected the top 5000 genes according to the mean value from large to small and deleted the outliers by hierarchical clustering method,Then,we performed weighted co-expression network algorithm analysis on the expression matrix containing 106 tumor samples and 5000 genes,then,the soft threshold 7 was selected to satisfy the requirement of the scale-free network building and 12 co-expression modules were obtained by dynamic shear method,according to the correlation analysis results of the characteristics and the obtained modules,two core modules related to its race can be picked out,Fun Rich software was used to explore the important biological pathways of its genes,the enrichment results showed that the genes in the core modules were mainly involved in the processes of Protein metabolism,Oxidoreductase activity,Structural constituent of ribosome,Nectin adhesion pathway and LKB1 signaling events.Since statistically significant differential genes play an important role in colorectal cancer,the Limma package was applied to colorectal cancer data to obtain differential analysis results,then,83 common genes were obtained through the intersection of genes in differential genes and core modules,the Survival package in R software was used for KM Survival analysis of common genes,according to the Log-rank p<0.05,the three core genes associated with colorectal cancer prognosis were SLMAP,GALNT6 and ANTXR2,respectively,finally,based on GEPIA database and UALCAN database,the differential expression of core genes in normal and tumor groups can be used to verify their accuracy,moreover,the relationship between the core genes and clinical phenotypic races could be further explored in the UALCAN database,and the results showed that the three core genes had differences among different races.The conclusion of this article provides a research direction for the selection of prognostic markers of colorectal cancer. |