Font Size: a A A

Construction Of Gastric Cancer Knowledge Graph And Drug Discovery Application

Posted on:2024-03-03Degree:MasterType:Thesis
Country:ChinaCandidate:Y W LuFull Text:PDF
GTID:2544306941963679Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Gastric cancer has a high incidence and mortality rate,which is particularly prominent in China.To fully utilize the expanding medical literature,this thesis systematically investigates how to extract biomedical knowledge automatically from literature and related databases and construct a gastric cancer knowledge graph,named GCKG,which can be used for drug discovery applications.The main contents include:(1)According to the characteristics of biomedical texts,models for entity recognition,entity normalization,and relation classification were designed respectively,based on which GCKG was constructed.Due to the long and irregular naming of medical entities,the entity recognition model applied the Bidirectional Gated Recurrent Unit and Interactive Pointer Network,and focused on identifying entity boundaries,thus improved the accuracy of entity recognition,with average F1 value on 8 entity recognition datasets reached 84.5%.For entity normalization task,a model based on Term Frequency-Inverse Document Frequency and Gated Attention Unit is proposed.It combines the semantic features and characteristic features of entities and the average Hits@1 of the model on 5 entity normalization datasets reached 95%.Aiming at the complexity of medical knowledge expressed in text,the relation classification model integrates cross-text features,entity features,and context features to more accurately predict semantic relationships.The average F1 value of the model on 11 relation classification datasets is 86.9%.Meanwhile,a multi-task learning method based on hard parameter sharing is used in those models,which can effectively improve the model performance and calculation speed.The final GCKG defines 5 entity types and 5 relationship categories,including 9129 entities and 88482 triples.(2)Drug discovery was studied based on built GCKG.Firstly,a biomedical knowledge embedding pre-trained language model called BioKGE-BERT was constructed to transform the knowledge graph to knowledge embedding vector.Then a drug-disease discriminant model was built based on CNN-BiLSTM,using knowledge embedding vector to predict whether the drug could treat gastric cancer.The final result shows that,9 out of the top 10 predicted drugs have been reported to be useful in the treatment of gastric cancer,which can well validate the medical value of GCKG.(3)An online platform for GCKG was developed to assist the research of disease mechanisms.The platform consists both a subsystem for biomedical knowledge extraction and a subsystem for gastric cancer knowledge graph retrieval.The former is based on the constructed biomedical knowledge extraction models,providing general entity recognition,entity normalization,and relation extraction functions.The latter is used to retrieve specific knowledge from the GCKG and to visualize the results.
Keywords/Search Tags:Knowledge Graph, Drug Discovery, Knowledge Extraction, Pretrained Language Model, Knowledge Emedding
PDF Full Text Request
Related items