| Chinese medicine has a large amount of literature,which contains valuable theoretical knowledge and practical experience and covers a wealth of Chinese medical knowledge,so it is of great value to conduct research on the identification of named entities in the field of Chinese medicine.In recent years,with the increasing status of TCM in China,medical resources such as TCM clinical decision support,assisted treatment,personalized knowledge analysis and medication pattern mining have become more and more important.Therefore,TCM has a very broad development prospect and the potential of medical knowledge contained in it is extremely huge.Since most of the medical literature in the field of TCM is derived from ancient medical texts,most of the TCM texts are composed of literary texts.Compared with modern texts,the literary texts are more obscure,simpler,and more general in semantics,and the phenomenon of multiple meanings and ambiguous words is more common.At present,the difficulty of naming entity recognition in the field of TCM lies in the division of polysemy and the disambiguation of ambiguous words,and the polysemy and ambiguity of words and phrases in TCM texts are also the main problems affecting the accuracy of naming entity recognition.In addition,the lack of perfect dictionaries in the field of TCM,the lack of standardization and organization in the process of TCM text transformation leads to great difficulties in dealing with specialized medical terms such as diseases,drugs and prescriptions.To address such problems,this paper adopts a deep learning approach to the recognition of named entities in TCM,and the main research contents are as follows:(1)The TCM naming body recognition based on the word embedding model was studied,using the TCM formula dataset,which contained 10989 pieces of data and six entities.In Thesis,two different word embedding models,Word2 Vec and BERT,are used for pretraining and used as input to the BILSTM-CRF model to realize the recognition of named entities task.Through comparative experiments,the recognition effect of the BILSTM model based on BERT word embedding is better,and the accuracy and F1 values reach86.52% and 86.69%,respectively.(2)The introduction of labeled attention networks in the recognition of named entities in TCM is investigated to further improve the accuracy of named entity recognition of medical texts in the field of TCM.In Thesis,the label attention network is used instead of the CRF layer for sequence annotation to construct the BILSTM-LAN model.LAN(Label Attention Network)can capture potential long-term dependencies between tags,making them have stronger feature extraction capabilities,which can further improve the recognition of named entities.From the experimental results,it can be concluded that the entity recognition accuracy and F1 value of the BILSTM-LAN model are 93.76% and 93.82%,respectively,and the overall recognition rate is higher. |