Named entity recognition on electronic medical records is an important prerequisite for the medical informatization construction.However,due to the high data sensitivity and annotation difficulty of electronic medical records,there are relatively few publicly available data resources,which makes it difficult for named entity recognition in this field to reach a practical level.Therefore,this paper uses the pre-trained model BERT and the Transformer encoder as the basic architecture,introduces two transfer learning methods,multi-task learning and domain adaptation,and supplements it with span-based decoder and adversarial training methods to improve the overall performance of named entity recognition on electronic medical records.Aiming at the problem of insufficient entity-related feature extraction for existing NER models,we propose a multi-task label-wise Transformer model.On the one hand,to solve the problem of insufficient utilization of lower-layer information in the vertical multi-layer network structure of Transformer,based on two named entity recognition related auxiliary tasks,entity boundary prediction and entity type prediction,the modeling pattern of hierarchical multi-task transfer learning is adopted.On the other hand,aiming at the random projection of attention heads in the lateral multi-head self-attention mechanism of Transformer,a label-wise unit combined with multi-task learning mode is proposed.According to the actual label meaning of each layer,different projection directions are assigned to the attention heads,urging them to participate more in the current task.Results on multiple datasets from different domains show that the multi-task label-wise Transformer model can achieve great performance close to the performance of named entity recognition models enhanced with dictionary or the structure information of Chinese characters without using any external resources.Against the transfer difficulty of NER model between different electronic medical records,we propose a domain adaptation method based on label sharing.The method constructs a shared encoder and a private encoder based on similar entity types between different corpora,and realizes cross-corpus named entity recognition on electronic medical records by partial transfer learning.Firstly,we propose a multi-head self-attention mechanism with label-wise ability,which cooperates with the entity type prediction task to complete the division of labor between encoders.Then in view of the annotation characteristics of electronic medical records,we use span-based decoding methods to identify nested entities,and design an entity filtering algorithm based on stack and undirected graph structures to adapt to non-nested data.Finally,we build adversarial samples by actively introducing perturbation information to reduce the interference of data noise on knowledge transfer across corpora during the training process.The experimental results on two evaluation datasets of electronic medical records show that the domain adaptation method based on label sharing can achieve higher recognition performance on NER tasks compared with common transfer learning methods. |