Currently,in the medical field,the text data related to various medical information is exploding.And the multi-source heterogeneity problem and the loosely organized structure of the texts on the Internet affect the mining and utilization of medical information.In response to this situation,extracting structured information from medical texts and then building a high-quality medical knowledge graph to store and manage medical knowledge,is an effective method of mining medical information.Medical relation extraction is an important task in the process of medical knowledge graph construction,aiming at identifying medical entities and relations in natural texts.The triples obtained from medical relation extraction are a necessary component of the data layer of the medical knowledge graph.Until now,existing medical relation extraction frameworks are obtained by adapting the methods in the generic domain based on the lexical syntactic features of medical texts.However,the popular sequence tagging-based models face the problems of overlapping triples or inefficient training.What’s more,the lack of manually labeled training data in the medical domain makes it difficult to apply the supervised models to medical relation extraction.To address the above two challenges,our research studies the relation extraction methods in medical scenarios,designs a supervised model and a few-shot framework based on sequence tagging,and then implements a semi-automatic system for constructing medical knowledge graphs.The main contents and contributions are as follows:1)We construct a supervised joint medical relation extraction model based on the bidirectional tree tagging scheme.First,based on the characteristics of the tree-like relation structure in the medical texts,we propose a fine-grained division method for the samples with overlapping triples.Second,we design a bidirectional tree tagging scheme to process different types of samples and transform the relation information into tags.Last,we develop a joint extraction model for the prediction of bidirectional tree tagging sequences.2)We construct a sequence tagging framework for few-shot relation extraction and propose two models based on it.First,we put forward a definition of the few-shot relation extraction task.Second,we apply the metric-based few-shot approaches to the supervised sequence tagging-based models and propose a few-shot relation extraction framework,which consists of three components,i.e.,the BERT encoder,the adaptive encoder,and the metric matrix calculation.Last,we adopt our bidirectional tree tagging scheme and the existing handshake tagging scheme to the framework,and then realize two few-shot models for relation extraction.Note that we also design an approximate acceleration strategy to improve the efficiency for training and inferring.3)We conduct adequate experiments and analyses to verify the effectiveness of the proposed framework and models.First,we adopt two medical datasets and three generic datasets to design the experiments for our supervised joint medical relation model.In the overall experimental results,compared with the best benchmark,the F1 scores of the proposed model are improved by 2.1%~2.5% on the medical datasets.And the training efficiency of our model is higher than other benchmarks with similar performance.Second,we use the NYT dataset to construct two few-shot relation extraction task.The average F1 scores of our fewshot framework outperform the best baseline by 9.2%~23%.In addition,we design the ablation experiments and the case study to further demonstrate the effectiveness of our framework and models.4)We build up a semi-automatic medical knowledge graph construction system,which applies the proposed models as a relation extraction module.We add a data tagging platform interface,a graph database display interface,and related functions to develop the system for constructing medical knowledge graphs.Our system provides functions like iterative training for relation extraction models,extraction of triples from natural texts,modification for labeled samples,and visual management of medical knowledge graphs. |