Objective:Glioma is the most prevalent malignant tumour in the central nervous system and can be classified as WHO grade I-IV according to the pathological classification,among which WHO grade III and IV tumours are high-grade gliomas with poor prognosis.The aim of this study was to analyze the messenger ribonucleic acid(mRNA)in the cancer genome atlas(TCGA)and the Chinese glioma genome atlas(CGGA)using a bioinformatics approach,The study aims to identify the independent risk factors affecting the prognosis of high-grade glioma patients by combining the expression of mRNA in high-grade glioma with patient clinical data,and to develop a more accurate prognostic model to estimate the survival of high-grade glioma patients.Methods:The mRNA expression profiles and clinical data of glioma patients were obtained from the TCGA database and CGGA database,and glioma samples meeting the inclusion criteria were screened for further analysis.The high-grade glioma samples from the TCGA database were used as the training set and the high-grade glioma samples from the CGGA database were used as the validation set.The gene expression of WHO grade III and IV glioma patients was compared with that of WHO grade II patients,and P<0.05 and |Log2FC|≥1 were used as the screening criteria for differential mRNA to screen differentially expressed genes.Gene ontology(GO)analysis of differential genes was processed with the Kyoto Encyclopedia of Genes and Genomes(KEGG)analysis by using the DAVID tool.lasso regression analysis was used to screen for the best prognostic candidate genes,and risk scores were calculated for each sample,and samples were divided by the median risk score into a high risk group(risk score > median value)and a low risk group(risk score < median value).Variables with a single-factor COX regression P<0.05 were included in the multi-factor COX regression analysis,and those with a P<0.05 were identified as prognosis-related variables.Kaplan-Meier survival curves were plotted to analyse the differences in prognosis between the high-and low-risk groups,and column line plots based on risk scores were constructed using R software to determine the C-index.Calibration curves were plotted to assess the agreement between actual and predicted survival.Results:After screening,390 high-grade glioma samples from the TCGA database were used as the training set and 330 high-grade glioma samples from the CGGA database were used as the validation set.The differential analysis screened 611 differentially expressed genes,including 520 upregulated genes and 91 down-regulated genes.Six genes(ABCG8,C4orf6,DMBX1,FAM90A7,MMP1,PI3)were included in the risk assessment model after Lasso regression analysis,and KM analysis showed a significant difference in the probability of survival between the high and low risk groups(P<0.01).prognosis-related variables included in the COX regression analysis were risk score,age and IDH status,and the model was constructed as a column line graph The constructed model was validated by the CGGA validation group with a C-index of 0.659,making the model more accurate.The calibration curves plotted predicted 1-,2-,3-and 5-year survival rates very close to the diagonal,indicating that the predictions were more in line with the actual situation and the model differentiated well.Conclusion:In this study,a prognostic model based on risk scores,age and IDH status derived from six genes(ABCG8,C4orf6,DMBX1,FAM90A7,MMP1,PI3)was developed by bioinformatics as well as statistical methods,combining clinical data as well as genomic data,and was validated to have good predictive power as a prognostic model for predicting survival in patients with high-grade glioma. |