As a country with a large population,medical resources are key to people’s life.With the development of Internet technology,more and more researchers apply computer technology to the medical research.The use of named entity recognition technology to extract medical text entities and build medical-related knowledge graphs can effectively improve medical standards and enhance medical efficiency.In this paper,we investigate named entity recognition,propose two named entity recognition models based on pre-training models,and apply them to entity recognition in the medical field,aiming to improve the accuracy of entity recognition in the medical field.Firstly,we investigate the BERT-BiLSTM-CRF model,which is a model that uses the BERT pre-training model as an encoder to obtain word-level vector representation of text,and then uses the bi-directional LSTM model to learn the temporal information from twp directions of the sentence,and finally combines the statistical probability-based conditional random field to obtain deeper relationships among data to obtain more accurate prediction results.Although BERT is excellent in text representation,in Chinese named entity recognition,its input is still a word-level vector representation,while in Chinese text,the vocabulary is the smallest unit to express the meaning,inputting word-level vectors can avoid the noise generated by different word separation criteria when inputting word-level vectors,but it wastes the Chinese vocabulary semantic which contains more information.In summary,we proposes a pre-training model based on Tag Embedding and Simple Lexicon lexical enhancement and word information fusion,which introduces two types of information enhancement at different network levels to exploit lexical information in Chinese.In particular,the Tag Embedding and Simple Lexicon based lexical enhancement methods are applied to the word embedding layer,which divides the sentences into different dimensions to obtain a richer representation of the text.The lexical enhancement method based on word information fusion is applied to the BERT encoding output layer,which incorporates the word boundary information of the text into the encoded richer text representation and increases the amount of information it contains.The model achieves satisfactory results on the Chinese medicine instruction dataset.Subsequently,we analyses and discusses the model and identifies its shortcomings: although the introduction of word-level vectors increases the information richness of the character-level vectors,but it encounters similar problems to those caused by the direct input of word-level vectors to the model.The direct introduction of word-level vectors introduces noise to the model due to different word classification criteria and uneven word classification quality.So,we proposes a multi-task pre-training model based on adversarial learning and network sharing,which introduces multi-task joint learning,with named entity recognition as the primary task and Chinese word separation task as the secondary task for training.A shared coding layer and a weight-gated neural network are used to enhance the information for the named entity recognition task,while an adversarial learning module is introduced to remove the noise from the shared information due to the specificity of the Chinese word separation task and improve the robustness of the model.Compared to the baseline model RoBERTa-wwm-ext-base,ours model has improved F1 scores on the CMe EE dataset.Also comparing the pre-trained model based on Tag Embedding and Simple Lexicon lexical enhancement with word information fusion,the model is superior. |