| As the carrier of patients’ clinical information,electronic medical records record a large amount of patient’s clinical information.The clinical data recorded in electronic medical records can provide a reference and basis for subsequent diagnosis,treatment,and research.Since electronic medical records are unstructured or semi-structured texts stored in natural language,this greatly limits the effective use of electronic medical records.Therefore,the research on information extraction of clinical medical text is of great significance.Using natural language processing technology to extract useful information from clinical medical texts is an effective way to improve the utilization efficiency of electronic medical records.This thesis researches named entity recognition and entity relation extraction tasks in clinical medical text information extraction,mainly extracting medical entities and entity relation contained in the text.The work of this thesis is of great value to related tasks such as automatic question answering,knowledge graph,information retrieval in the medical field.In the previous methods,the sequence labeling model is the main method,which has two problems: the lack of external knowledge of the model and the nesting of entities.This thesis proposes a method to transform the task of named entity recognition and entity relation extraction into the task of machine reading comprehension,which takes advantage of the similarity between the task of span extraction type and the task of information extraction.Upstream and downstream task patterns are adopted to construct deep learning models for the two tasks respectively.Through the framework of machine reading comprehension,the task-related prior knowledge is integrated into the model by manual customization problem,which is the main improvement of this thesis.This approach improves the problem that the sequential annotation model can only be modeled based on text information and can not take advantage of external knowledge.Besides,to deal with the entity nesting problem,the answer prediction module of the named entity recognition model uses the boundary model to decode the location of the entity mention.In the embedded module of the entity relation extraction model,entity category tags and cross-sentence information are added to enhance the entity relation extraction capability of the model.The proposed model was tested on two clinical data sets.Compared with the sequence labeling model,the F1 scores of the named entity recognition model in this thesis on CANTEMIST and N2C2 datasets are improved by 14 and 12 percentage points respectively.In the N2C2 data set,the F1 score of the entity relation extraction model is improved by 7 percentage points compared with that of the Bert model. |