Font Size: a A A

Research On Diabetes Knowledge Extraction For Chinese Journal Articles

Posted on:2022-12-20Degree:MasterType:Thesis
Country:ChinaCandidate:A D ChenFull Text:PDF
GTID:2494306776492124Subject:Library Science and Digital Library
Abstract/Summary:PDF Full Text Request
With the breakthrough of artificial intelligence technology and the gradual expansion of the application field of big data technology,medical care,as an important part of the national economy and people’s livelihood,has received extensive attention.The development of artificial intelligence technology has also accelerated the continuous deepening of the medical field,catering to the growing needs of the people for medical applications.The continuous development of applications such as intelligent medical services,intelligent consultation,and medical rumor identification has profoundly changed the way of life of the people.How to excavate medical-related information from massive texts and extract important medical knowledge is of great significance to the development of intelligent medical care.This problem can be effectively solved by acquiring professional medical information and combining with deep learning technology.Medicine is a highly specialized field with many branches,complex conceptual attributes and relational categories.High-quality professional medical data is an important basis for achieving better deep learning model effects.Therefore,this paper mainly achieves the following tasks:Using UMLS(Unified Medical Language System)as a professional medical data standard,the Chinese translation of the medical concepts stipulated in UMLS is captured through selenium.The definition of medical concepts and relationships in UMLS is used as the basis for data labeling,and entity concepts and relationship categories are determined,and a total of 7 concept types and 14 relationship types are set.This paper is aimed at extracting knowledge about diabetes from the bibliography of Chinese journal papers.By marking the abstracts of j ournal documents according to the specified concept types and relationship types,and using machine learning and deep learning methods to perform the training task of named entity recognition,using hidden Markov model,conditional random field model,BiLSTM,BiLSTM-CRF,BertBiLSTM-CRF and Bert-CRF models were trained on labeled data respectively,and the entity recognition effects of different models were compared.After performing entity recognition on journal documents,we conduct entity alignment research on the extracted entities,obtain the synonym results of medical concept entities based on UMLS and Baidu Translate,and use this as the data base for model training.Use Word2vec,GloVe and node2vec to vectorize entities,and achieve entity alignment by calculating the semantic distance between vectors.We use Word2vec+part-of-speech tagging for optimization.Finally,the BiLSTM+attention mechanism model and Bert model are used to extract the relationship between entities,and the knowledge extraction task of diabetes diseases in Chinese journals is completed.According to the characteristics of the three tasks,different models are used for verification.Among them,the CRF model has the best effect in the named entity recognition task,and the F1 value reaches 80%;in the entity alignment task,we use the TopN indicator to test different entity alignment methods.The accuracy rate of the entity alignment method of the Word2vec+part-of-speech tagging method is 68.9%,which achieves the optimal effect of this experiment;in the relation extraction task,the relation extraction based on the Bert model has the best effect,and the F1 value reaches 91%,realizing the optimal effect of relation extraction.best effect.
Keywords/Search Tags:Knowledge extraction, Entity alignment, Diabetes, Deep learning
PDF Full Text Request
Related items