With the continuous development of medical informatization,the number of various medical texts containing rich medical knowledge has increased sharply,making it difficult to efficiently use the valuable information in the texts.Knowledge graphs are widely used because they can represent massive structured text knowledge and realize fast query of knowledge.At the same time,as the key cornerstone of driving smart medical services and applications,the medical knowledge graph is helpful to the development of clinical decision-making assistance,intelligent diagnosis guidance and other applications.Most of the medical texts are semi-structured and unstructured texts with unclear structures,which cannot be directly stored in the medical knowledge graph.Relation extraction becomes a key technology for constructing medical knowledge graphs because it can realize the explicit semantic structure of medical texts.Therefore,this thesis conducts entity relation extraction research for different types of texts.The main work done is as follows:(1)Aiming at the semi-structured text in the medical field,with the goal of constructing a medicine knowledge graph,the relation extraction research is carried out using the BERT-Bi LSTM-CRF model.The model obtains the deep semantic representation of the input text sequence through the BERT language model,and inputs it to the Bi LSTM layer for further semantic encoding.The CRF layer processes the dependency between the output labels to obtain the optimal annotation sequence of the text.The experimental results The F1 value of the experimental results reached94.9% and 95.7% respectively.Secondly,based on the existing semi-structured information in the drug text,the structured relation between drugs and various entities is constructed.Finally,in order to solve the limitation of multi-source drug knowledge and the differences in expression methods,Chinese Medicine Knowledge Graph(CMKG)is constructed from two levels of schema layer and data layer respectively.(2)Aiming at the unstructured text in the medical field,a multi-head relation extraction model Ro KE-PN-Mhead that integrates external medical knowledge is proposed.The model uses the Ro BERTa pre-training model to encode the input text in the embedding layer and incorporates external medical knowledge to enhance the semantic information of the text.At the same time,the embedding layer vector is input into the pointer network for entity recognition,and the multi-head selection mechanism is used to extract the entity relation according to the extracted entity vector.The experimental results on the medical data set CMe IE show that,compared with other deep learning models,the Ro KE-PN-Mhead model can achieve better results when extracting medical relations,with the F1 value reaching 59.14%.The validity and extensibility of relation extraction by the model without external medical knowledge were verified on the general domain data set Du IE 2.0,and the F1 value reached 70.55%. |