With the rapid development of computers and artificial intelligence,natural language processing technology has taken on a new life and is gradually penetrating into various fields.Medical care is an important field that is closely related to people’s lives,and the recognition of named entities in Chinese electronic medical records is a hot topic and a difficult area of research.At present,there are still many problems in the research on the recognition of named entities in Chinese electronic medical records.Firstly,due to the cost and patient privacy issues,the size of the dataset available for training is insufficient to obtain a model with high accuracy and robustness.Secondly,Chinese electronic medical records(EMR)are much complex while the current single-task model is not universal and compatible with different types of electronic medical record datasets.To address the above problems,this paper first introduces the major domestic and international named entity recognition techniques in the medical domain and points out their problems.On this basis,this paper constructs a new batch of medical named entity recognition datasets using the electronic medical record resources of tertiary hospitals.Then,this paper analyses the performance of three different structured baseline models,BiLSTM-CRF,Tranformer-CRF and BERT-CRF for the existing problems,proposes a Chinese EMR named entity recognition model based on multi-tasking and transfer learning.The innovations of this paper compared to the traditional single-task classification model is as follows:(1)Introducing and improving multi-task learning methods.A unique multi-task structure is used to set up unique decoders for different categories of Chinese electronic medical record datasets,so that the entire dataset can be fed into the model through collective training,which can effectively save time cost and computer hardware resources.(2)Introducing and improving transfer learning methods.Designing a shared encoder based on the BERT model enables potential common knowledge between different electronic medical records to be migrated and learned across datasets,and effectively prevents the model from over fitting and catastrophic forgetting.Finally,the experimental results demonstrate that the algorithm proposed in this paper has good compatibility and robustness,with better performance in terms of accuracy,recall and F1 score on all four datasets,especially for long-tailed data and small-scale datasets. |