Font Size: a A A

Knowledge Graph Construction And Application For Radiation-Associated Genes

Posted on:2021-05-13Degree:MasterType:Thesis
Country:ChinaCandidate:H XuFull Text:PDF
GTID:2370330614970440Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
Ionizing radiation(hereinafter referred to as “radiation”)is high-energy radiation capable of separating electrons or other particles from atoms or molecules,thereby “ionizes” atoms or molecules.When radiation acts on the body,it will transfer energy to the body's molecules,cells,tissues,and organs,resulting in changes in their morphology,structure and function.Being exposed to radiation above a certain dose will cause serious damage to the human body,including hematopoietic disorders,organ damage,cancer,or even death.Therefore,it is of great significance to study radiobiological effects.However,mechanism underlying radiobiological effects is very complex.Radiation can directly act on various types of biomolecules,and affect the function of cells at the molecular level.For example,radiation can directly lead to the generation and accumulation of genetic mutations,which can cause cancer.In addition,radiation can also cause the disorder of intracellular signaling pathways,leading to more extensive damage to the body such as inflammation.At present,a large number of omics data and literature information related to the molecular mechanisms of radiation are scattered in millions of biomedical literatures and databases,which has brought great challenges to the study of molecular mechanisms of radiobiological effects.There is an urgent need to use bioinformatics methods to organize and extract unstructured knowledge from omics data and literature resources,in order to be applied to further studies in the field of radiation biomedicine.To address this challenge,this thesis conducts multiple studies for the construction and applicaton of radiation-associated gene knowledge graph.First,this thesis developed an informatics system for the construction of radiationassociated genes knowledge graph.After investigating the related technical methods of knowledge graph construction,the top-down strategy was chosen.Multiple types of radiation-related biomedical data including literature,GO biological process,KEGG biological pathway,Me SH term,GEO gene expression dataset,and so on;an efficient biomedical entity recognition system was established on synonyms;a radiationassociated gene extraction method was developed based on co-occurrence relationship;and 0.13 was selected as the appropriate threshold for co-occurrence relationship determination.All these studies have laid the foundation for the extraction of radiationrelated knowledge and the establishment of knowledge graph.Then,this thesis constructed a biomedical knowledge graph centered on radiationassociated genes.Through analysis and manual interpretation of about 29 million biomedical literatures,598 and 663 genes related to nuclear radiation and ultraviolet radiation were identified,and 611 differentially expressed genes as well as their expression datasets were compiled from the GEO database.It was found that genes associated with both nuclear radiation and ultraviolet radiation tend to have certain biological functions such as response to stimulus,DNA damage repair,cell cycle regulation,and immune response.Meanwhile,both types of genes also have their own characteristics.Radiation-associated genes tend to have biological functions such as DNA repair(DNA level),cell division,chromosome segregation,and lymphocyte immune regulation,while ultraviolet-related genes tend to have biological functions such as environmental stimuli response,DNA repair(nucleotide level),and inflammatory response.An online knowledge base was further developed for genes related to nuclear radiation and ultraviolet radiation with user-friendly access interfaces and abundance information of biological process and pathways,facilitating the understanding of the the list of genes related to nuclear radiation and ultraviolet radiation.Finally,using radiation-associated gene knowledge graph established in this thesis,the research of radiation biological dosimeter was explored.A Random Forest-based radiation classification algorithm was first developed to identifiy biomarkers.Through investigation and analysis of six human peripheral blood gene expression datasets(GSE102971,GSE116162,GSE20162,GSE23393,and GSE55869),three datasets(GSE102971,GSE20162,and GSE57059)with high consistency were screened out,and then 5 radiation classification biomarker genes(DDB2,PHLDA3,PPM1 D,PCNA,and GADD45A)were found.These radiation classification biomarkers tend to participate in biological processes such as p53 signal transduction,radiation response,stress response,DNA damage repair,and cell cycle regulation.Thereafter,a radiation prediction model based on gene expression datasets was established.Through the five-fold cross-validation of the irradiation prediction model,it is demonstrated that it has a good prediction efficacy.
Keywords/Search Tags:Radiation, Genes, Knowledge Graph, Biomarker
PDF Full Text Request
Related items