| Clinical medical event extraction refers to a main medical entity,related entities and attribute information about the diagnosis and treatment of a disease that is automatically extracted from clinical data.Multiple medical events can be extracted to build clinical event knowledge graph.It reflects the change of state of affairs and provides support for drug discovery,medical decision-making,evidence-based medical practice and disease surveillance,etc.The clinical medical event extraction task is usually divided into two subtasks,event element extraction and event element association.Event element extraction mainly identifies and extracts medical entities and attribute information.Event element association refers to the combination of extracted medical entities and attributes to form a complete medical event.There are ambiguous boundaries and nested structures in the constituent elements of clinical medical events.Element association needs to consider the dependency relationship between elements,which is closely related to contextual information.While there are few labeled data sets for training,which brings great challenges to clinical medical event extraction.To address the above problems,this topic focuses on the following research:(1)A Chinese medical named entity recognition method based on a reading comprehension framework is proposed for the problem of blurred entity boundaries and diverse nested forms in event elements,which are modeled as a machine reading model.The method uses questioning to distinguish different types of entities,uses BERT to establish the association between reading comprehension questions and medical texts,and introduces a multi-headed attention mechanism to strengthen the semantic connection between questions and nested entities,and finally uses two classifiers to predict the beginning and ending positions of entities.The experimental results show that the method achieves optimal results with a combined F1 value of 67.65%,which is a 7.17% improvement compared to the classical Bi LSTM-CRF,with a 16.81% improvement for clinical presentation entities with nested entities.(2)A Chinese medical event extraction method incorporating multiple features is proposed for the problem that event element association requires consideration of dependencies between elements and difficulty in obtaining contextual features.The method uses an improved Chinese-Ro BERT to construct the embedding and feature extraction parts of the model,and adds multiple word sliding window features in the CRF layer to improve the feature transfer capability of the model.The experimental results show that the method achieves an overall F1 value of 80.21% on the three attribute entities,which is 15.11% better than the classical Bi LSTM-CRF model,and the result after fusing multiple features is 4.71%better than that of the original CRF features.(3)To address the situation of few labeled data of clinical medical events,a semi-supervised learning idea is used to expand the labeled dataset,and a high-confidence pseudo-labeled data selection algorithm is proposed.The method predicts unlabeled data with a model of BERT combined with multi-feature CRF,and the predicted data are selected according to the high-confidence pseudo-labeled data selection strategy to obtain 300 pieces of higher quality data to merge with the original data,finally constructing a corpus of 1700 pieces and retraining the model,and the final F1 value reaches 81.56%,an improvement of 1.35%. |