Font Size: a A A

Weighted Gene Co-expression Network Algorithm(WGCNA) Combined With Lasso Regression Model To Construct A Cervical Cancer Prognosis Model

Posted on:2022-01-23Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhouFull Text:PDF
GTID:2504306776463804Subject:Oncology
Abstract/Summary:PDF Full Text Request
In order to explore the prognostic markers of cervical cancer,this study was based on the transcriptome sequencing data of 306 patients with cervical cancer,the corresponding clinical follow-up information data and the transcriptome sequencing data of 3 adjacent tissues based on the Cancer Genome Atlas(TCGA)database,4103 genes with significant expression difference between cervical cancer and adjacent tissues were identified(2273 genes with significant up-regulation difference and 1830 genes with significant down-regulation difference in tumor tissues).Then,the weighted gene co expression network algorithm(WGCNA)was used to construct the co expression network for the genes with obvious differential expression.The results showed that all differentially expressed genes could be divided into 22 coexpression network modules.The 22 coexpression network modules were labeled with different colors and numbers.The number of genes ranged from 19 to 349,including 6 modules(grey-1,turquoise-3,yan-5,green-12,toyablue-15,lightgreen-23)and 4 clinical features(survival time,pathological features,HPV infection status,HPV type)Significant correlation: turquoise-3module is positively correlated with survival time and HPV infection status,and negatively correlated with pathological characteristics;tan-5 module is positively correlated with survival time and HPV type;green-12 module is positively correlated with pathological characteristics,lightgreen-23 module is negatively correlated with HPV infection status,grey-1 module is positively correlated with HPV type,and Royal Blue-15 module is positively correlated with survival time Then,the clinical information data were integrated,the relationship between the module and phenotype was analyzed,and the modules of interest were imported into the network construction software Cytoscape for visualization.Through the network visualization,the genes with the most connectivity in the module network were determined as the key node genes.Cox survival regression analysis was performed on these key node genes to construct the proportional wind Risk regression model.Finally,the expression value of the top 5% genes in this module is combined with the patient’s prognosis information,and the prognosis model is established by using lasso regression analysis algorithm(least absolute shrinkage and selection operator).Through the screening of laaso regression model of machine learning,8key node genes are finally selected(zscan,maats1,linc00649,ITM2 A,perp,duoxa1,grybg2,gimap7)were included in the survival and prognosis model.After scoring the 8 genes in the model,the risk coefficient was calculated according to the median expression of each gene and divided into high and low groups,in which the 3-year survival rate of high-risk group was 0.75 and that of low-risk group was 0.9,with significant difference(P = 0.0001)The results show that the scoring prognosis model can accurately predict the prognosis of patients.Among the single genes contained in the model,ITM2 A has the best single factor prediction efficiency(P= 0.0087)When the expression of this gene in patients is higher than the median expression,the prognosis of patients is better,and the 3-year survival rate is 0.8.When the expression is lower than the median expression,the 3-year survival rate is only 0.6.The prediction efficiency of zscan18 and MAATS is also higher than the expectation of P < 0.05(both P = 0.033)The 3-year survival rate of zscan18 was 0.75 when it was higher than the median expression,only0.59 when it was lower than the median expression,0.76 when MAATS was higher than the median expression,and 0.61 when it was lower than the median expression.The p value of single factor analysis of perp = 0.029,in which the 3-year survival rate was 0.76 when it was higher than the median expression and 0.57 when it was lower than the median expression.CR Ybg2,duoxa1 and linc00639 d performed poorly in univariate analysis(P values were 0.62,0.61 and 0.21 respectively),so the three genes did not have the ability to predict the survival of a single gene.In order to verify the accuracy and specificity of the model,the time-dependent receiver operating characteristic curve(ROC)was drawn for the scoring model,the results show that the AUC(area under curve)of the model is 0.78 in 1 year,0.72 in 3 years and 0.73 in 5 years,and the three AUC values are greater than 0.7,indicating that the prediction model has high accuracy and specificity.Compare the prediction model with clinical tumor classification(TNM)In conclusion,this study constructed a perfect weighted co expression network of cervical cancer,found several key node genes highly related to the prognosis of cervical cancer,and developed a practical cervical cancer model The cancer prognosis scoring model provides suggestions and guidance for the treatment of cervical cancer patients.
Keywords/Search Tags:machine learning, algorithm, cervical carcinoma, prognosis, WGCNA
PDF Full Text Request
Related items