Objective: Traditional Chinese medicine medical records are records of traditional Chinese medicine practitioners’ diagnosis and treatment based on syndrome differentiation,identification of syndrome elements,and the inheritance of clinical experience in traditional Chinese medicine.Extracting effective named entities and structured medical records from medical records described in natural language is the foundation for using machine learning,deep learning,and other technologies to deeply explore traditional Chinese medicine knowledge.After more than2000 years of development,traditional Chinese medicine has a large number of ancient texts in its medical records,and has the characteristics of free writing,rich terminology expression,and complex structure,which poses challenges to the extraction of named entities in medical records.This study aims to use named entity recognition technology to extract key information such as symptoms,prescriptions,and syndromes from medical records,and structure medical records to provide structured clinical data for later use of data mining,knowledge graph,and other technologies.Methods: Our research team extracted some terms from the electronic version of the "Chinese Modern Famous Traditional Chinese Medicine Medical Record Essence Series" and constructed a terminology dictionary for automatic annotation of medical record texts.This study proposes a dictionary based bidirectional maximum matching method for corpus annotation,which uses BIO to annotate body parts,symptoms,diseases,drugs,etc.in 400 medical records.The annotated text is then manually verified to ensure the accuracy of the annotation,resulting in a annotated text of over 50000 words.The body parts,drugs,symptoms,and diseases in medical records are labeled using the bidirectional maximum matching method,while other contents are labeled as O.In order to improve the recognition rate of named entities and analyze the unique characteristics of the radicals required for identifying TCM named entities,static crawling technology was used to collect radical radicals and construct a radical dictionary.Finally,a named entity recognition model based on deep learning was constructed using BERT as the foundation model,combining LSTM,CRF,and radical features.The BERT-LSTM-CRF radical feature model needs to embed Chinese character radicals in the BERT word vector and train them,and use BiLSTM to extract features and CRF for sequence prediction.Results: Using labeled medical record data for experiments,the accuracy rate of the model in this article is 85.72%,the recall rate is 86.62%,and the F1 value is 86.17%.Compared with the models BERT,BERT-CRF,and BERT-BiLSTM-CRF,their F1 values have increased by 2.68%,2.47%,and 1.48%,respectively.It is found that the model in this article performs best in medical record entity recognition.Conclusion: The experimental results show that after embedding radical features,the P value,R value,and F1 value of the model are all improved.The radical features are closely related to the entities in medical records,and embedding them makes the model more targeted for entity recognition,which can be used for the recognition of named entities in traditional Chinese medicine medical records.Meanwhile,compared with other methods,the method proposed in this paper has better entity recognition performance,providing new ideas for knowledge discovery of traditional Chinese medicine medical records. |