Font Size: a A A

Research On Key Technologies Of Knowledge Extraction For Medical Literature

Posted on:2022-04-02Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y WangFull Text:PDF
GTID:1484306611954779Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
There are a lot of valuable knowledge in various medical guidelines,specifications,academic papers and other documents.It has become a research hot-spot to extract structured knowledge,construct knowledge graph and apply it to practical deduction and reasoning scenarios.Knowledge extraction involves three tasks:Named Entity Recognition(NER),Relationship Extraction(RE)and Entity Resolution(ER).Due to the manually labelling is laborious,the pre-training models are widely used in this field to gain the performance.In this dissertation,pre-training models like ERNIE or BERT for Chinese Natural Language Processing(NLP)tasks are introduced into medical knowledge extraction.Firstly,a data augmentation method based on synonym replacement is proposed,and the transfer learning methods of the pre-training models are enhanced to improve the recognition effect of named entities;At the same time,entity information is utilized to improve the accuracy of RE task;The learning method based on knowledge graph representation is used to realize entity resolution and reduce the redundancy and sparsity of knowledge graph.In order to verify the knowledge extraction method proposed in this dissertation,the knowledge graph and drug recommendation system are built based on the "Standards of care for type 2 diabetes in China(2020 Edition)".Experts believe that the system is applicable in clinical auxiliary diagnosis and treatment.The main work of this dissertation are as follows:1.Research on data augmentation and Named Entity Recognition based on pretraining modelsNamed Entity Recognition aims to identify medical entities in sentences.In order to overcome the shortage of labelled training set,a data augmentation method based on pre-training model is proposed.Using the "Mask Language Model"(MLM)function of the pre-training model ERNIE,synonyms are replaced for the non-entity part of the literature to form a "New Literature" and add the original training set,data augmentation is realized in this way.At the same time,two transfer learning methods of pre-training model for Named Entity Recognition are improved.For the feature-based method,a hybrid model and its hybrid strategy are proposed to replace the full connection layer with poor feature extraction ability.The model utilizes the sequence modeling ability of Recurrent Neural Network(RNN)and the local information extraction ability of Convolutional Neural Networks(CNN).For the fine-tuning method,the features marked by[CLS]are jointly trained during training,so the sentence level features are introduced.Using the above two improvements,the F1-values of the "Yidu-4K dataset" and the"Chinese diabetes annotated dataset" are respectively 0.8575 and 0.7947.2.Research on Relationship Extraction method based on pre-training modelRelationship Extraction aims to determine the relationship between discrete medical entities.Traditional RE methods only regard it as a text classification task,ignoring the role of entities on information.This paper improves the sentence classification function of the pre-training model,introduces the relevant information of entity pairs,and improves the effect of relationship extraction.Specifically,the entity tags[ES]and[EE]are added to the input sequence,and the entity category vector is added to the input vector;The features corresponding to[CLS],[ES]and entities are used respectively when calculating relationship classification.This dissertation proposes an improved method for extracting the relationship between the "SemEval 2010 Task 8 dataset" and the "Chinese diabetes annotated dataset".The F1 values are 0.8960 and 0.9210 respectively.3.Research on Entity Resolution method based on knowledge graph representation learningEntity Resolution task aims to identify and integrate entities with different expressions and consistent meanings in knowledge graph.The traditional entity unification method does not consider the association information between entities.This dissertation proposes an Entity Resolution method based on knowledge graph representation learning.After the entity pair is screened by editing distance,the vector representation of entities is obtained by fitting the triplet relationship with TransE,TransH and TransR models,and the similarity between entities is calculated by the vector representation.Using the method of knowledge map based learning,Entity Resolution of "Chinese diabetes labeled dataset" provided by Ali Tianchi is studied.It is found that when the TransH model is selected and the similarity is 0.5,the highest F1 value can be obtained,which is 0.7527.4.Construction of knowledge graph based on knowledge extraction model and implementation of personalized drug recommendation systemIn order to verify the knowledge extraction method,the knowledge graph and drug recommendation system are constructed and implemented based on the "Standards of care for type 2 diabetes in China(2020 Edition)" with the goal of personalized drug recommendation.The constructed knowledge graph contains 761 medical entities,10 entity relationships,and 1244 triplet knowledge pairs,which are stored in the graph database Neo4j.The drug recommendation system not only combines diseases and symptoms with drug indications and side effects in the knowledge graph for drug use reasoning,but also analyzes the similarity of patients based on user portrait for drug recommendation.Thus,the use of drug guidance is both scientific and personalized,and will not be too simple.The system has been applied to the health promotion service station,and experts believe that the system is applicable in clinical auxiliary diagnosis and treatment.
Keywords/Search Tags:Medical literature, Knowledge extraction, Named entity recognition, Relation extraction, Entity resolution, Pre-training model, Drug recommendation
PDF Full Text Request
Related items