Font Size: a A A

Exploration Of Molecular Mechanism And Prognosis Risk Model Of Lung Adenocarcinoma Based On Bioinformatics

Posted on:2019-07-03Degree:MasterType:Thesis
Country:ChinaCandidate:H M FengFull Text:PDF
GTID:2334330566964884Subject:Surgery
Abstract/Summary:PDF Full Text Request
Objective:1)To construct a scale-free network of lung adenocarcinoma(LUAD)gene by Weighted Gene Co-Expression Network Analysis(WGCNA)to find out the key genes of LUAD and the biological functions.2)Screening the key "buried" genes of WGCNA target genes,then combining the clinical data of.The Cancer Genome Atlas.(TCGA)LUAD gene to find the mRNA,miRNA and IncRNA related to the prognosis and exploring their possible biological functions.3)Through digging TCGA mRNA,IncRNA and miRNA gene expression profile data to construct ceRNA network of LUAD and to explore the possible oncogenesis molecular mechanism of LUAD.According to the clinical data of TCGA,find the key mRNA,IncRNA and miRNA molecules related to LUAD development and survival prognosis,and then further explore their biological functions.4)COX regression analysis was used to establish a COX-risk proportional hazard regression model based on mRNA,lncRNA and miRNA,respectively,and to explore the predictive value of three models for prognosis of LUAD.Methods:1)LUAD gene expression profile data and clinical data were downloaded from GeneExpression Omnibus(GEO)database which were normalized.Using the "limma" package in R language(3.4.3)to screen differential expression genes(DGEs)with the threshold(log FC>1.0 and P<0.05).Then the WGCNA was used to construct a scale-free network to find the core genes.Finally,the GO and KEGG pathways analysis of the core genes were performed to explore their biological functions.2)The core genes of first part was introduced into GenCLiP2.0 website to screen the key "buried" genes,then to verify the relationship of "buried" genes between the expression of mRNAs and the prognosis of LUAD via TCGA datas.The ROC curve was drawn to detect the diagnostic value of "buried"gene in LUAD.LncRNAs to prognosis for LUAD were made in the same way,then predicting the possible target genes of LncRNAs by Co-LncRNA database.The potential target genes of miRNAs were searched by miRDB database,finally,the biological functions of all key "buried" genes were defined by GO and KEGG analysis.3)Manifest and metadata data of LUAD transcriptome were downloaded from the TCGA database,and then the original counts data were downloaded under cmd environment with GDC-client download tool.The expression matrix of the original data was extracted by Perl language script,and the Homo_sapiens.GRCh38.89.chr.gtf.gz file was downloaded from Ensembl website to obtain gene-symbol named expression matrix.Then using Perl language to extract mRNA and IncRNA expression matrix respectively,the miRNA gene expression matrix was obtained in the same way.The DEGs of three kind of RNAs was extracted by using the "edgeR" package of R language with the thresholds set as(|log FC |>1.0 and P<0.01).Then blasting the IncRNA DEGs with the miRNA DEGs via miRcode website and miRNA target genes through the miRDB,miRTarBase and TargetScan database,and then construct the relationship between IncRNA DEGs and the miRNA DEGs,the miRNA DEGs and mRNA DEGs,finaly,import the data into the cytoscape to construct the ceRNA internet.Survival package of R language was used to carry out gene survival analysis to extract the interest genes.4)Downloading the LUAD gene expression data from the TCGA database and using the perl language script to merge the survival data and the gene expression data,a univariate(gene)COX analysis was performed at first,then the multifactorial COX analysis was performed with genes which were selected according to the univariate COX P value.Survival-related linear risk assessment models were constructed based on the selected gene expression profiles and regression coefficients,and risk score for each sample were calculated,Then the samples were divided into high and low risk groups according to the median of riskscore,and Kaplan-Meier survival curves were assessed for overall survival rate differences of high and low-risk group sample,and the time-dependent ROC curve was used to assess the predictive ability of each model for 3-year survival rate of LUAD.Then the clinical data of TCGA were randomly divided into two groups by random number table to verify the value and stability of different regression models in predicting the prognosis of patients with LUAD and assess whether the target predictive models are prognostic factor for LUAD independent of other variables.Results:1)GSE40791 data contains a total of 194 cases,including 94 LUAD samples and 100 normal lung samples,with a total of 3789 DGEs,(1625 up-regulated genes and 2164 down-regulated genes).WGCNA analysis eventually obtained three research modules with 92 hub genes.GO and KEGG analysis suggested that these pivotal genes may play roles in cell cycle,mitosis,chromosome assembly and separation,extracellular mechanism,immunoglobulin binding,protein serine/threonine kinase activity and tubulin binding,and p-53 signaling pathway,protein digestion and absorption,cell aging and other processes.2)When importing the hub gene into GenCLiP2.0 website,10 "buried" genes were obtained,including 5 mRNAs,4 miRNAs and 1 LincRNA,TCGA data was used to verify the differential expression of mRNA in LUAD.The results showed that genes Clorf198 and GRAMD2 were downregulated in the tumor group,while genes MAP7D2,MRPL15 and NUP62CL were up-regulated.Survival analysis showed that the genes Clorf198 and GRAMD2 may be protective genes for LUAD,while genes MAP7D2,MRPL15 NUP62CL may be the oncogenes of LUAD.The ROC showed that the area under the curves of MAP7D2,MRPL15 and NUP62CL are 0.815,0.932 and 0.773,respectively,suggested a high predictive value for LUAD.Twenty target genes were highly matched with 4 miRNAs by blasting miRDB database.The results of TCGA data survival analysis suggested that LIN00926 may be a protective gene of LUAD(HR =1.33,P = 0.019),and the first 20 target genes of the most significance were obtained by Co-LncRNA database blasting.GO and KEGG analysis suggested three kind of RNAs possiblly were involved in cell cycle,intracellular components,organelle assembly and lysis,immune maintenance,spindle assembly,base metabolism,DNA replication,cell division,P53 signaling pathway;gene or protein binding,signal transduction,tumorigenesis,mTOR,MAPK,ras and cAMP signaling pathway;cell membrane composition,cytokine-cytokine receptor interaction,B cell receptor and chemokine signaling pathways.3)A total of 594 transcrip tome counts were obtained from the TCGA database,including 59 normal samples and 535 LUADs.A total of 567 miRNA counts were obtained,including 46 normal samples and 521 LUAD samples.A total of 2,504 differential mRNAs were screened(1977 up-regulated,527 down-regulated),1633 1ncRNAs(1425 up-regulated,208 down-regulated),111 miRNAs(88 up-regulated and 23 down-regulated).After blasting IncRNA DEGs with miRNA DEGs,65 IncRNAs and 8 miRNAs were obtained.By blasting the miRNA DEGs with the target genes,20 mRNAs were obtained,and then 488 pairs of 1ncRNA-miRNA-mRNAs were constructed.Survival analysis suggested that NAV2-AS2,C20orfl97,E2F1 and SLC1A1 are LUAD protective genes whereas AC020907.1,AP002478.1,HOTTIP,HOTAIR,LINC00488,LINC00536,KIF23,CLSPN,CCNE1,CEP55,POU6F2-AS1,CHEK1 and hsa-mir-31 were oncogenic genes of LUAD.4)The risk assessment models were constructed based on 7 mRNAs,6 lncRNAs and 8 miRNAs,respectively,and the risk score were-0.1286xSLC2Al+0.1375xMELTF+0.1227xFETUB+0.098xNTSR+10.1071xVAXl+0.1169xFAM83 A+0.1467xANLN;-0.1693xAC034223.2+0.1531xLINC01312+0.1854xAL353746.1+0.1515xAC139722.1+0.3576xAC034223.1+0.1977xLINC02310 and 0.1544x hsa-mir-3607+(-0.1228xhsa-mir-3189)+(-0.2297xhsa-mir-490)+(-0.2874xhsa-mir-5571)+0.0754xhsa-mir-31+0.0452xhsa-mir-196b+0.1952xhsa-mir-1293+0.1197xhsa-mir-548f-l,respectively.The three models illustrated considerable stability in the whole group and two randomized groups,resp.The overall survival time was significantly lower in the high-risk groups compared with the low-risk group under three models(P=1.92E-10,P =1.75E-09,P =0,respectively),suggesting that these three models are of great value in predicting the survival and prognosis of LUAD.The COX univariate and multivariate analysis in different groups showed that all the three models were independent predictors of LUAD(high risk vs.low risk:HR>1,P<0.05).Conclusion:1)Advanced bioinformatics analysis methods such as WCGNA,ceRNA network and COX risk proportional regression model can contribute to explaining and supplementing the possible mechanism of LUAD,and lay the foundation for the basic research of LUAD.2)The studies of biomarkers and signaling mechanisms are particularly important to explore the occurrence and development of LUAD.3)ceRNA lncRNA-miRNA-mRNA mechanism plays an main role in the molecular genesis of LUAD and provides the new theory for LUAD targeted therapy.4)Cox risk models based on the mRNA,lncRNA and miRNAs can better predict the survival prognosis of patients with LUAD,which are conducive to define independent predictors of LUAD molecular level and screen patients with high-risk prognosis,and guide to formulate individualized treatment programs.
Keywords/Search Tags:Lung adenocarcinoma, WGCNA, ceRNA, COX analysis, TCGA, Bioinformatics, Prognosis
PDF Full Text Request
Related items