The development of computer information technology has brought a huge amount of medical data.As the most important part of the clinical information system,electronic medical records detail information during patient diagnosis and treatment,and contain many valuable clinical resources.Free text in electronic medical records exists in semi-structured and unstructured form.To effectively extract the effective information contained in these unstructured medical records,it is essential to use natural language processing for text mining.Named entity recognition is the basic task of text mining.Therefore,it is meaningful to recognize medical entities in electronic clinical records.This study constructed a small medical domain dictionary,and combined CRF to mark two different granularity labeling operations.It recognized diseases,symptoms,operations and drugs from Chinese electronic medical records.At the same time,the recognition effect of deep neural networks is also analyzed.The main work of this study is shown below.(1)This paper constructs a domain dictionary,which uses statistics to get keywords from Chinese electronic medical records and obtain help from external professional resources.At the same time,it annotates a Chinese corpus of named entities for electronic medical records.(2)This paper proposes a double-layer annotation model(DLAM)that combined with the domain dictionary and CRF to mark two different granularity labeling operations.The medical domain dictionary is manually constructed has extremely high accuracy for the recognition of registered words,and machine learning can automatically recognize unregistered words.This work integrates the two aspects based on the above advantages.Aiming at the test dataset,the work obtains the Macro-P with 96.7%,the Macro-R with 97.7% and the Macro-F1 with 97.2%.(3)This paper also compares the effects of DLAM and deep learning on this task.Domain corpus is used for pre-training and fine-tuning the existing models for deep learning.Based on Bi LSTM-CRF and Transformer-CRF,the differences between DLAM and deep learning are discussed.(4)This paper proves that DLAM is universally efficient for all Chinese electronic medical records.It then uses DLAM to recognize clinical entities of the pediatric medical records and discover the common morbidity characteristics of the pediatric medical records from the recognition results.DLAM for named entity recognition from Chinese electronic medical records proposed in this study is excellent.It can efficiently and quickly recognize clinical entities from unstructured electronic medical records texts.This study lays the foundation for further medical information extraction. |