Font Size: a A A

Construction Of Prognosis Model Of Lung Adenocarcinoma Based On Bioinformatics Analysis Of Glycolysis Related Genes

Posted on:2022-03-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y M KangFull Text:PDF
GTID:2480306329482914Subject:Surgery
Abstract/Summary:PDF Full Text Request
Objective: The data of lung adenocarcinoma samples from the Cancer Genome Atlas(TCGA)program were downloaded,integrated,and analyzed in order to explore the differential expression of glycolytic genes in between lung adenocarcinoma tissue and adjacent tissue.In addition,using data analysis aims at examining the relationship among the differential expression of glycolytic genes,pathological features,and prognosis in patients with lung adenocarcinoma.Furthermore,the study will establish a prognostic model in order to predict the survival rate based on the glycolytic genes expression,which may provide new ideas for clinical treatment and prognosis evaluation for patients with lung adenocarcinoma.Methods: The original mRNA expression data from 497 cases of lung adenocarcinoma and 54 cases from paracancerous tissues were downloaded from TCGA database.R language is used to sort out and merge the data,and then the unified standardized analysis is carried out.The "limma software package" in R software was used to analyze the difference of glycolytic gene expression data of samples.After the expression of glycolysis genes and the survival of the samples were combined,the survival software package was used to carry out univariate Cox regression analysis on the glycolysis related genes,and the glycolysis genes related to the survival of patients with lung adenocarcinoma were screened out.At the same time,by using the "corrplot software package" analysis,we found that there was a close relationship among the expression of glycolysis genes.Then we used the "consensus cluster plus software package" to cluster the samples.According to the expression of glycolysis genes,the samples were classified.Through the survival analysis and heat map,we found that there were significant differences among the prognosis of the samples.The R package "glmnet package" was used for lasso regression analysis of glycolysis related differential genes,and 10-fold cross validation was used to determine the optimal ? value,so as to screen out the prognosis related glycolysis genes and the corresponding regression coefficient,and to obtain the risk score based on glycolysis gene expression.The Kaplan Meier survival curve was drawn to confirm the significant difference between the high and low risk groups.Then the "survivalroc package" was used to draw the receiver operating characteristic curve(ROC curve),and the area under the curve(AUC)was calculated to evaluate the accuracy of the risk scoring formula.Finally,according to the clinical characteristics of lung adenocarcinoma,other factors that may affect the prognosis including age,gender,T stage,N stage and cancer stage.Univariate Cox regression was used to investigate the correlation between each factor and overall survival rate.Multivariate Cox regression was used to further verify the risk score of glycolytic gene expression.Finally,we constructed a risk model to evaluate the 1-,2-,3-year survival rate of patients with lung adenocarcinoma according to T stage,N stage,cancer stage and risk score.Nomograms were drawn by "RMS package" to predict the survival of lung adenocarcinoma.Results: Through the collection data from 497 cases of lung adenocarcinoma and 54 cases of paracancerous tissues in TCGA database,62 kinds of glycolysis related genes were found to be differentially expressed,including 54 up-regulated genes and 8 down regulated genes.Using | logfc | ? 1,P < 0.05 as the screening condition,19 glycolytic genes were found to be differentially expressed,of which 2 were down regulated and 17 were up-regulated.After the differential expression of glycolytic genes in lung adenocarcinoma samples was found,the expression of glycolytic genes was combined with the survival status of the samples.Univariate Cox regression analysis showed that there were 14 genes related to prognosis,which were GAPDH,PKM,pgam1,nup50,tpi1,eno1,pfkp,GPI,aldoa,ppp2r1 a,h K3,nup37,h K2,prkaca.According to the forest map,pgam1,nup50,ppp2r1 a,nup37 and h K2 were risk factors(HR > 1).The protective factors(HR < 1)were h K3 and prkaca.Combined with correlation analysis,it is suggested that glycolytic genes are differentially expressed in lung adenocarcinoma tissues,and the correlation between gene expression can significantly affect the prognosis of tumor.By cluster analysis,we found that lung adenocarcinoma patients can be divided into three types according to the gene expression.By drawing the survival curve,we found that there was a significant survival gap between the three types of samples,and the prognosis of type 2 was the worst,which further confirmed the correlation between glycolytic genes and the prognosis of lung adenocarcinoma.The 35 glycolytic genes(P < 0.4)were further analyzed by lasso regression analysis,and 11 key genes(prkaca,ppp2r1 a,PKM,pgam1,pfkp,nup50,h K3,GAPDH,eno3,eno1,aldoa)and corresponding regression coefficients were screened out.Based on these,the formula for calculating the prognostic risk score of lung adenocarcinoma according to the expression of 11 glycolytic genes was obtained,and its validity and accuracy were confirmed.The patients were divided into high-risk group and low-risk group based on the clinical related variables and the new riskstore model.The low-risk group and the high-risk group had better prognosis(P < 0.001).The ROC curve showed that the area under the curve of 1-,2-,3-,5-year survival rate was 0.742,0.725,0.673,0.608,respectively,indicating that the model has good predictive ability for the prognosis of patients with lung adenocarcinoma.Univariate Cox regression was used to investigate the independent factors significantly related to the overall survival rate,and then multivariate Cox regression was used to further verify that the risk score obtained from the expression of key differential genes in glycolysis is an independent risk factor for the prognosis of lung adenocarcinoma.Finally,a risk model was built to evaluate the survival rate of patients with lung adenocarcinoma according to T stage,N stage,cancer stage and risk score.Nomograms were drawn by "RMS package" to predict the survival of lung adenocarcinoma.Conclusions: In this study,it has demonstrated that there is a significant relati onship between glycolytic genes and the prognosis in patients with lung adenoc arcinoma by the analysis of data from the TCGA database.The risk scoring for mulas of 11 main glycolytic genes were determined,and then in combination o f the use of risk score,cancer staging,and the stage of cancer to predict the risk model of the survival rate in lung adenocarcinoma patients.Hopefully,this study plays an effective improvement towards clinical research and individualized treatment for patients having lung adenocarcinoma.
Keywords/Search Tags:Glycolysis, Adenocarcinoma of lung, TCGA, Prognostic model, Bioinformatics
PDF Full Text Request
Related items