Font Size: a A A

Screening Of Sugar Chain Related Genes In Hepatocellular Carcinoma Based On Network Analysis And Machine Learning

Posted on:2020-11-22Degree:MasterType:Thesis
Country:ChinaCandidate:Y F BaiFull Text:PDF
GTID:2404330590495086Subject:Biology
Abstract/Summary:PDF Full Text Request
Cancer morbidity and mortality are high in China.With the rapid development of second-generation sequencing,the use of Biostatistics and computer language to dig biological problems has become a hot topic.Therefore,the application of bioinformatics to reveal and solve biological problems plays an increasingly important role in scientific research.Sugar chain related genes such as glycosyltransferase and glycoside hydrolase have been proved to be closely related to tumor migration,recurrence and anti-chemotherapeutic drugs.There have been many studies on the effects of sugar chain-related genes and their functions on cancer phenotypes and molecular mechanisms.Previously,our team found that the RNA-seq data of TCGA database showed that the expression levels of many sugar chain-related genes were significantly changed in various cancer tissues.Based on the above findings,this experiment will focus on the differentially expressed sugar chain-related genes in hepatocellular carcinoma,and use machine learning and weighted gene co-expression network(WGCNA)to analyze them.The aim is to find the sugar chain-related genes that play an important role in the development of cancer,and other genes that co-change with them,and to grasp the changes of sugar chain-related genes on a larger scale.To further identify the key gene(hub gene)and analyze its function by bioinformatics.In this study,we selected the expression profiles of glycan-related genes in TCGA and GTEx databases and compared the predictive ability of three machine learning models(random forest,support vector machine,logistic regression)for cancer occurrence.We found that the AUC values of the three models were 0.9836,0.9903 and0.9986,respectively.Combining the results of confusion matrix,it is found that the three models are better than the normal ones in predicting cancer samples.By comparing AUC,confusion matrix and error rate of the three models,logistic regression is the best model among the three models.Using logistic regression,16 genes with statistical significance were screened,including FUT7,FUT8,HYAL3,CHI3L1,PIGM,MGAT2,GLT6D1,AMY2B,A4GALT,LFNG,MAN1C1,PIGB,HEXB,NEU4,GALNT13 and FUT9.At the same time,in order to further study the interaction network of sugar chain related genes,WGCNA was constructed in TCGA and GTEx databases.By calculating the absolute value of Pearson correlation coefficient between any two pairs of genes,when the optimal weighting coefficient is 6,R~2 reaches the maximum and approaches0.9,and WGCNA with the best fitting effect is obtained.On this basis,genetic similarity matrix was transformed into adjacency matrix,and 13 gene expression correlation modules were obtained.Using verification set to verify the conservativeness of each module,it is found that gold,Turquoise and blue modules have the best conservativeness(Z>10).Through the correlation analysis of module and phenotype,we found that turquoise module and blue module had the highest correlation with phenotype,up to 0.8 and 0.73,which indicated that turquoise module played an important role in the development of cancer.GO and KEGG enrichment analysis of these two modules revealed that they were enriched in many important biological pathways,such as protein transport,RNA localization and so on.On the basis of the above results,we validated the transcriptome of NEU4,an important gene shared by machine learning and Turquoise modules.In the transcriptome of NEU4 gene overexpression,15 of 83 potential transcription factors were differentially expressed,and these transcription factors were all in the turquoise module,which verified the reliability of the turquoise module and the importance of the related genes in the occurrence and development of cancer.At the same time,in the transcriptome of NEU4 gene overexpression,the differentially expressed B4GALT2 and PLOD3 genes also occurred,which proved the accuracy and repeatability of network construction and machine learning.Based on machine learning and WGCNA,this study constructed a sugar chain-related gene interaction network closely related to hepatocellular carcinoma,and screened out important sugar chain-related genes,which provided ideas for further exploring the biological functions and significance of these genes,and also provided clues for the development of glycobiology of hepatocellular carcinoma,and provided theoretical basis and data support for the diagnosis and treatment of hepatocellular carcinoma.
Keywords/Search Tags:glycogen, hepatocellular carcinoma, WGCNA, machine learning
PDF Full Text Request
Related items