| Colorectal cancer(CRC)ranks the third in the incidence and the second in the mortality rate among cancer-related deaths in the world,and all of them show a continuously rising trend.However,due to metastasis,relapse,drug resistance and other reasons,the prognosis of most patients is not satisfactory.Therefore,there is an urgent need to discover new and effective prognostic markers and therapeutic targets.M6A modification of LncRNA plays an important role in the development of tumors.Based on COAD gene expression and clinical data in The Cancer Genome Atlas(TCGA)public database,this study constructed LncRNA(Long non-coding RNAs)associated with M6A(N6-methylation).To explore the relationship between the risk grouping based on the model and the clinical characteristics and immune infiltration of colorectal adenocarcinoma,and conduct Gene Expression Omnibus(GEO)database and tissue experiment verification,so as to verify the effectiveness of the model and explore new prognostic monitoring and drug therapy targets of COAD.It is divided into the following three parts:Part One Screening of LncRNAs associated with M6A in colorectal adenocarcinoma and establishment of an independent prognostic modelObjective:To understand the expression of M6A-associated LncRNAs of COAD,study the relationship between M6A-associated LncRNAs and the prognosis of COAD,and establish a prognosis model to search for new prognostic markers and drug therapy targets.Methods:RNA sequencing(RNA-seq)of COAD was obtained from TCGA database,and clinical data were collected by grouping according to survival time,survival state,gender,age,stage and TMN stage.Then Perl software,web software and R language software were used to analyze M6A related genes,and the expression of RNA and LncRNA,as well as the expression of M6A related genes were obtained.Finally,R package was used to obtain the expression of LncRNA associated with M6A.The M6A-related LncRNAs associated with prognosis were obtained by univariate Cox regression.LncRNA samples related to prognosis M6A were divided into training and test datasets(50%in each group).R software package and Lasso regression were used to optimize the model by cross-validation method,and finally an independent prognostic model composed of LncRNA was obtained.Results:In this study,39 healthy samples and 398 tumor samples were downloaded from the TCGA database,and 9 prognostic M6A-related LncRNAs were obtained by univariate Cox regression:LINC02657,NSMCE1-DT,AC139149.1,ZKSCAN2-DT,AC156455.1,ZEB1-AS1,AP001619.1,AL391422.4,ATP2B1-AS1.All were high-risk LncRNAs.The expression of AC156455.1 in tumor tissue was significantly higher than that in normal group.R software package and Lasso regression were used to obtain an independent prognostic model consisting of 7 LncRNAs.The model formula is:risk score=LINC02657*0.246632699492067+AC139149.1*0.276937064691493+ZKSCAN2-DT*0.016797173163626+AC156455.1*0.187570142850178+ZEB1-AS1*0.569626111717306+AL391422.4*0.537921082314907+ATP2B1-AS1*0.555444251652824.There was a difference in survival between the high and low risk groups of the training and test groups,with the high risk group having a lower survival rate.The area under curve(AUC)of 1-year survival in the training group and the test group were 0.733 and 0.673,3-year were 0.729 and 0.715,5-year were 0.748 and0.859 respectively,indicating that the model has some accuracy in predicting prognosis.Through the risk curves of the heat map,survival state and risk state maps,we found that there were more deaths in the high-risk group.In both the training and testing groups,the expression of all prognostic related LncRNAs was higher in the high-risk group.LncRNA ZEB1-AS1 expression was significantly increased in high-risk groups.Through univariate and multivariate Cox regression,risk scores were determined as independent prognostic indicators,which verified the independence of the model.Conclusions:1.Compared with normal tissues,M6A-related LncRNAs were significantly abnormally expressed in COAD tissues,all of which were high-risk LncRNAs equivalent to carcinogenic factors.2.The prognostic model composed of LINC02657,AC139149.1,ZKSCAN2-DT,AC156455.1,ZEB1-AS1,AL391422.4,ATP2B1-AS1 was an independent prognostic indicator of COAD.Part Two Study on the relationship between clusters and risk score groupings with clinicopathologic features and immune infiltration based on prognostic modelObjective:To explore the relationship between the prognosis model composed of LncRNAs associated with M6A and the clinicopathologic features and immune infiltration of COAD.Methods:Survival and survminer R software package verified that the model was suitable for patients in different clinical groups.We used limma,ggpub,and pheatmap packages to explore differences in risk scores among seven LncRNAs with clinicopathological features,immune scores,different clusters,and even R-packages(p<0.001***,0.01**,0.05*).We then used the limma,ggplot2,ggpubr,and ggextra packages to identify associations between immune cells and risk scores.Samples with P>0.05 are filtered out before analysis,and the sum of all immune cells in the sample equals 1.Spearman test was used for correlation coefficient and p value,which were shown in scatter plot.Results:LncRNAs associated with M6A prognosis were divided into two categories by consistent cluster analysis.There were differences in survival analysis between the two groups,and the survival rate of type 2 patients was low.The expression of AC156455.1 was significantly increased in type 2, suggesting that AC156455.1 may be mainly related to the occurrence and development of COAD.The clinical features were verified by the prognostic model,and it was found that the model was suitable for age,sex,T,N stage and other clinical features.By analyzing the differences of LncRNA expression in different clusters,clinicopathological features,immune scores,and high-and low-risk groups,we found that there were differences between high-and low-risk groups in different clusters and N and M stages.The proportion of cluster 2 in the high-risk group was higher and the survival rate was lower,suggesting that the high-risk group was associated with poor prognosis.The worse the clinical stage,the higher the risk.All LncRNAs associated with prognosis were at high risk.The expression of ZEB1-AS1 was significantly increased in high-risk group.There were five types of immune cells with different expressions in different types:CD4 memory activated T cells,follicular helper T cells,activated NK cells,CD4 memory T cells and the first three immune cells in memory B cells showed up-regulated expression in type 2,and the survival rate was low,which may be related to poor prognosis.CD4 memory resting T cells were up-regulated in type 1.There were differences in immune scores,matrix scores and estimates between type 1 and type 2 patients.All had low type 2 scores.The lower the score,the higher the tumor purity,which may be associated with poor prognosis.Memory B cells were positively correlated with the risk score,suggesting that their content was positively correlated with the patient’s risk.The higher the number of memory B cells,the higher the risk.Conclusions:1.Through the validation of the prognostic model of the clinical features,it was found that the model was suitable for age,gender,T,N stage and other clinical features.2.High-risk group with poor prognosis.The worse the clinical stage,the higher the risk.All LncRNAs associated with prognosis were high-risk markers.The expression of LncRNA ZEB1-AS1 was significantly increased in high-risk group.Therefore,the prognostic value of LncRNA ZEB1-AS1 in COAD needs to be further studied.3.Immune cells are associated with the risk of COAD in patients.CD4memory activated T cells,follicular helper T cells and activated NK cells may be associated with poor prognosis,providing a certain reference for immunotherapy.Memory B cells were positively correlated with patients’risk,and the higher the number of memory B cells,the higher the risk.Part Three Validation of prognostic models in GEO database and tissue experimentObjective:The validity and prognostic value of the model were verified in another database and tissue experiment respectively.Methods:The GSE39582 dataset was obtained from the GEO database containing 579 samples,and the expression data of 7 M6A-related LncRNAs and the survival data of the samples were obtained.Risk scores were calculated according to the model,and subjects were divided into high and low risk groups according to the median risk score of the TCGA data training dataset.Through R’s survival package,we calculated whether there was a survival difference in the high-low risk group.We collected 20 pairs of paired COAD samples and adjacent normal tissues from the Second Hospital of Hebei Medical University.The expression levels of 7 LncRNAs were detected by q RT-PCR.2-ΔΔCtmethod was used to calculate the relative expression level.Results:According to the model,we calculated the risk score of the GEO dataset,and obtained the high and low risk groups according to the median risk score of the TCGA data training dataset.Finally,we found survival differences between the high-risk and low-risk groups.The expression of 7LncRNAs in tumor tissues was significantly higher than that in normal tissues adjacent to tumors,especially ZKSCAN2-DT and AC156455.1,which were consistent with the results of TCGA data.Conclusions:1.The M6A-related LncRNA model of COAD is validated effectively on GEO data sets.2.The tissue qRT-PCR experiment preliminarily verified that the prognostic model of M6A related LncRNA in COAD may be reliable and effective. |