Since entering the big data era,information technology has created a broader space for development and a stronger development momentum in various fields,and the realization of intelligent Chinese medicine knowledge has become a general trend,so how to effectively extract key information from massive Chinese medicine text data has become a hot spot of concern for many researchers.Named entity recognition,as an important task in TCM text information processing,has a broad prospect,and the recognition and classification of related entities in traditional Chinese medicine texts through named entity recognition can provide a good basis for TCM diagnosis decision-making,construction of TCM knowledge graphs and TCM diagnostic QA systems and other applications.This paper focuses on the task of entity recognition in traditional Chinese medicine instructions,and the main research content includes the following parts:(1)Data enhancement and corpus construction.In this paper,the dataset of Chinese medicine instructions from Tianchi Big Data Competition were preprocessed,and the data was enhanced by generating pseudo-labels.The training set was expanded to 1.5times of the original one,and the dataset was divided randomly according to the ratio of8:1:1.The experimental dataset was annotated by the BIO annotation method,and the final data set was constructed.(2)The construction of domain dictionary and the introduction of dictionary features.Using a crawler approach to obtain TCM domain knowledge from the Internet,and building a dictionary in the field of TCM,the dictionary is encoded into the sequence of word representation,introducing the features of the dictionary into the entity recognition of TCM instructions,and providing richer word-level information for the training of the model.(3)Model optimization based on parallel multi-head attention and adversarial training.The sequential representation with dictionary features is fed into Bi LSTM and parallel multi-head attention mechanism to capture contextual information and pay attention to the semantic information of local words,and the output of the two is integrated into the sequence prediction layer.Compared with the serial mode,the performance of the model can be greatly improved.Then,disturbance is added through adversarial training to further improve the robustness of the model.This paper designs and develops a corresponding entity recognition system for traditional Chinese medicine instructions,and relevant experiments were carried out on Chinese medicine instructions dataset to verify the effectiveness of the proposed method. |