Font Size: a A A

Named Entity Recognition And Relation Extraction For Traditional Chinese Medicine Knowledge Graph

Posted on:2022-12-18Degree:MasterType:Thesis
Country:ChinaCandidate:D LeiFull Text:PDF
GTID:2504306728471084Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
After thousands of years of inheritance,traditional Chinese medicine has developed a unique and complete theoretical system with important clinical practice guidance.With the development of modern society,the rapid flow of information and the updating and supplementation of case data have accumulated a large amount of textual data in the field of traditional Chinese medicine.How to apply deep learning technology to better structure and visualize the valuable traditional Chinese medicine experience has become an significant problem.This study uses named entity recognition technology to determine entity boundaries of TCM entities in Chinese data,and then uses relationship extraction technology to classify their relationships.Finally,identifies valuable information to present TCM data in a structured and visualized way,with a view to promoting the process of information retrieval,mining the laws of TCM identification,and exploring disease mechanisms in the field of TCM.This study starts from data acquisition of clinical medical cases,and after a series of work,the original corpus for named entity recognition and relationship extraction experiments is constructed,followed by the study of named entity recognition and relationship extraction in this field.(1)Based on the existing semantic system in the field of TCM,we used the BIOES sequence annotation method to annotate entities and relationships under the guidance of experts,and divided entities into 6 categories and relationships into 5 categories according to the standards in the Chinese Medicine Language System(TCMLS).corpus.(2)To analyze the problems of complex Chinese semantics and difficulty in defining entity word boundaries,the named entity recognition model Latice-LSTM is optimized to incorporate the word information specified in Chinese medicine to solve the problems of semantic polysemy,ambiguity and abstraction in the field of Chinese medicine.Meanwhile,Dice loss is incorporated to solve the label imbalance problem in this domain.Experiments show that the improved Lattice-LSTM is more superior in the named entity recognition in this domain.(3)The entities derived from named entity recognition and the used TCM prescribed vocabulary information form entity pairs,and the Multi Tag ERNIE model is constructed to extract their relationships.The experiments show that the model has better results in relationship extraction in TCM domain.Finally,the extracted entities and relations are disambiguated to form a structured knowledge graph.The extracted entity-relationship knowledge is used to construct a knowledge graph in the field of TCM to store and display the information in TCM medical cases in the form of a graph database to help the inheritance and development of TCM.This study also provides a new scheme for the structured,intelligent,digital and visual construction of TCM,and provides new ideas and methods for future domain-specific NLP.
Keywords/Search Tags:Knowledge Graph, Natural Language Processing, Named Entity Recognition, Relation Extraction, Traditional Chinese Medicine
PDF Full Text Request
Related items