Font Size: a A A

Identification And Study Of Potential Diagnostic Biomarkers For Early Hepatocellular Carcinoma

Posted on:2024-06-12Degree:MasterType:Thesis
Country:ChinaCandidate:G Y HuFull Text:PDF
GTID:2544307064487354Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
ObjectiveHepatocellular carcinoma has an insidious onset and is generally in the advanced stages once found,and the prognosis is poor.Biomarkers of hepatocellular carcinoma have been found to have low sensitivity and specificity.Therefore,in this study,we screened key genes for early hepatocellular carcinoma based on bioinformatics methods,constructed diagnostic models,and further explored the nature of key genes,thus providing some theoretical basis for finding diagnostic markers for early hepatocellular carcinoma.Methods1.In this study,data from the Gene Expression Omnibus(GEO),The Cancer Genome Atlas(TCGA)database,and the International Cancer Genome Consortium(ICGC)database were acquired and preprocessed.2.Data were obtained in the GEO database and differential analysis was performed in early hepatocellular carcinoma,cirrhosis,chronic hepatitis and normal samples of GSE54238 to screen for genes that were significantly differentially expressed in patients with early hepatocellular carcinoma and not differentially expressed in patients with cirrhosis and hepatitis.Weighted gene co-expression network analysis(WGCNA)was performed in early hepatocellular carcinoma and normal tissues of GSE54238,which mainly included the process of finding the optimal soft threshold,dynamic shear tree algorithm to identify modules,identifying key genes associated with traits,and functional enrichment analysis of genes.Then,the hubs genes were screened by the 8 methods of the Cytohubba plug in Cytoscape(MCC,MNC,EPC,Degree,Closeness,Stress,Radiality,Bettleneck),and then intersected with the differential genes obtained by the above differential expression analysis to obtain the key genes of this study.We then use three R packages(limma,edgeR,DESeq2)in the TCGA database for validation.3.For the key genes obtained by the above screening,logistic regression analysis is first used to explore the diagnostic value of a single gene by constructing the receiver operating characteristic curve(ROC)respectively,and then we used machine learning algorithms such as random forest,support vector machine,decision tree,naive Bayes and neural network,and combine the five-fold cross-verification method to screen the best parameters of the model.Build diagnostic models and externally validate using datasets from ICGC and GEO databases.4.For the subsequent exploration of key genes in this study,firstly,the association between key genes and immune-infiltrating cells was analyzed through the TIMER database;secondly,the genetic variation of key genes was explored through the cBioPortal database,and the transcription factor-gene expression network,miRNAs-gene expression network,transcription factor-gene-miRNAs expression network and chemical-gene expression network were constructed through the NetworkAnalyst platform;the key genes are introduced into the Enrichr platform,and the chemical drugs of the key genes are screened according to the P-value through the DSigDB database.Finally,we performed subsequent exploration of key genes by box plots in the GSE49515 blood dataset.Results1.Through differential analysis,we screened 1096 genes that were only differentially expressed in early hepatocellular carcinoma,but not significantly in chronic hepatitis and cirrhosis.In the WGCNA analysis,we first identified 15 modules,of which the turquoise module and the yellow-green module were the most strongly correlated with clinical traits,and by constructing a scatter plot,we screened out the key genes in these two modules,a total of 527.Next,after screening by Cytoscape software,12 key genes were obtained.By intersecting with the above difference analysis results,we screened out seven key genes:CCNB2,UBE2C,TYMS,PLK1,RFC4,PSMD2,and EZH2.Afterwards,validation was performed in the TCGA dataset,and except for PSMD2,this study found that the other six key genes were statistically significant(P<0.05)in all three methods,and |logFC|>1,which was a significantly upregulated gene.2.When diagnosed individually,the area under the ROC curve of the six key genes was above 0.8,and during the joint diagnosis,this study finds that the diagnostic performance of the five machine learning models was good,among which the random forest model has the best performance,and the area under the ROC curve in the validation set was 0.9799.External verification in the ICGC dataset also shows that the random forest model has a better effect,and the area under the ROC curve is 0.9372.The area under the ROC curve verified externally by the GSE6764 dataset was 0.9083,proving that the model’s predictive ability is better.3.This study found that six key genes and B cell,CD4+T cell,CD8+T cell,macrophage,neutrophil and dendritic cells were all correlated(P<0.05),and in the analysis of genetic variation,UBE2C had a high mutation level with a frequency of change of 10%.In addition,we also found that the expression of mRNA increased when the copy number of all key genes was slightly amplified.4.On the NetworkAnalyst platform,we used the JASPAR database to predict the relationship between 41 transcription factors(TFs)and 6 key genes,and the MiRTarBase v8.0 database to predict the relationship between 97 miRNAs and 6 key genes.The RegNetwork database was used to construct a TFs-miRNA regulatory network,consisting of 6 key genes,46 miRNAs,and 51 TFs.5.Using the CTD database to build a protein-chemical network,we found that the interaction network consists of 6 key genes and 232 chemicals.The t op 10 chemicals were 7,8-dihydro-7,8-dihydroxybenzene(a)pyrene-9,10-oxide,ben zo(a)pyrene,calcitriol,cyclosporine,cobaltous chloride,estradiol,methotrexate,t estosterone,troglitazone,and vitamin K3.Based on the Enrichr platform,the D SigDB database was selected to predict possible drug molecules,and the top 10 drugs were extracted according to P-value.Dimethylnitroether,troglitazone,vi tamin K3,chromium,testosterone,calcitriol,antimony,enterolactone,potassium antimony tartrate and fluorouracil.The chemicals predicted in both databases w ere troglitazone,testosterone,calcitriol,and vitamin K3.Conclusions1.Based on a variety of bioinformatics methods,six key genes(CCNB2,EZH2,PLK1,RFC4,TYMS,UBE2C)of early HCC were screened and verified,which may be potential diagnostic biomarkers for HCC.2.The area under the ROC curve of the five machine learning algorithms was above 0.8,among which the random forest model had the highest AUC value and the best prediction ability.3.Four compounds(calcitriol,testosterone,troglitazone and vitamin K3)identified from the two databases may be potential therapeutic agents for HCC.
Keywords/Search Tags:Early hepatocellular carcinoma, Biomarkers, Weighted gene co-expression networks, Diagnostic models, Chemical drugs
PDF Full Text Request
Related items