Font Size: a A A

Research On Chinese Electronic Medical Record Named Entity Recognition Based On CCRF-AL Method

Posted on:2020-09-11Degree:MasterType:Thesis
Country:ChinaCandidate:X Y ChenFull Text:PDF
GTID:2404330602961600Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,China’s medical software and hardware facilities have become more and more perfect,and Hospital Information Systems(HIS)have been popularized in various medical institutions,thus accumulating a large number of Electronic Medical Record(EMR)data.EMR is the actual data of clinical diagnosis and treatment,which has high medical research value.Information extraction technology can obtain the needed data information for research from a large number of EMR texts,and the Named Entity Recognition(NER)task is the basis and key of information extraction technology.The EMR texts contain a large of private information,and there is no large-scale public corpus available for research currently.The lack of research corpus hinders the development of NER research in the medical field in China.The characteristics of Chinese language symbols,the EMR texts and entity features of the medical field also increase the difficulty based on the Chinese EMR text NER.In order to improve the effect of entity recognition in the case of small-scale training data,this paper analyzes Chinese EMR text and entity features,and consists of word features,part-of-speech features,context features,word boundary features and entity identification words.A Cascaded Conditional Random Fields(CCRF)model based on features is built.In order to ensure the performance of the model while reducing the scale of training data and the workload of manual labelling,the Active Learning(AL)method based on uncertainty is improved.The CCRF model is trained by the initial training data.The data with the Flag value greater than 0.5 is selected from the unlabelled data and added to the candidate pool.The data with the similar value less than 1 is selected from the candidate pool to be manually labelled and utilized finally,added to the training data.The updated training data train the CCRF model again until the F value of the recognition effect changes less than 0.5 and the iteration is stopped.The method improves the effect of entity recognition in the case of small-scale training data,and identifies the disease name,drug name and symptom name entity in Chinese EMR text,and the F value of the recognition effect reaches 84.66%,91.35%and 92.41%,respectively.The improvement of the entity recognition effect can promote the development of natural language understanding in the medical field while improving the results of information extraction.
Keywords/Search Tags:Chinese electronic medical record, Named entity recognition, Cascaded conditional random field model, Active learning method
PDF Full Text Request
Related items