Font Size: a A A

Deep Learning-based Recognition Of Named Entities In Chinese Electronic Medical Records

Posted on:2024-05-15Degree:MasterType:Thesis
Country:ChinaCandidate:Z Z ZhaoFull Text:PDF
GTID:2544306923962719Subject:Master of Electronic Information (Professional Degree)
Abstract/Summary:PDF Full Text Request
Objective:In the context of the explosion of intelligent technology,information technology is closely integrated with the medical industry,and most health institutions have basically realized information management and accumulated a huge amount of medical record information,which records the whole process of patient consultation-healing and contains rich medical activities and medical knowledge.If electronic medical record information is used to build medical applications similar to intelligent auxiliary diagnosis system and intelligent medical service platform,it can promote further development of medical industry.Named entity recognition technology is the first step to realize the effective use of clinical text.There are a large number of named entity recognition algorithms,and various methods have been constructed based on three perspectives of extraction method,annotation method and dimensional representation,which provide ideas and experiences for the research of named entity recognition,but they also have some shortcomings.For example,errors in the extraction process will accumulate errors,the calculation speed of sequence annotation is slow and the problem of entity overlap will occur,and the single-dimension approach will lack semantic information and have the problem of unregistered words.In addition,the special nature of the medical industry makes the public medical data set relatively small,and the Chinese electronic medical record is different from the English electronic medical record,which not only has the disadvantages of blurred Chinese word boundaries,obscure word features and multiple meanings of words,but also has a large number of medical terms and special expressions in the electronic medical record.Therefore,the study of naming entity recognition in Chinese electronic medical records has important research significance and practical importance to promote the development of medical industry.Therefore,this paper explores named entity recognition for Chinese electronic medical records.Methods:The ability of deep learning algorithms to automatically extract features avoids the problem of manual errors,and this paper proposes two methods based on deep learning for named entity recognition of Chinese electronic medical records.First,a small medical database is constructed.Due to the small number of publicly available datasets and the different data formats of each dataset,the c EHRNER,c Med QANER,YIDU-S4 K and CHIP2020 datasets are converted to a unified text format for the sake of smooth subsequent experiments.Secondly,to address the problems of blurred Chinese word boundaries,error-prone word segmentation,low amount of semantic information of characters,and multiple meanings of words,this paper proposes a model to enhance word information and contextual features based on the dual dimensionality of fused word features from the perspective of dimensional representation,and extracts feature vectors through two modules of weighted statistical word set and BERT model to enhance word information embedding,avoid the occurrence of word segmentation errors and solve the problem of multiple meanings of words The problem of multiple meanings of words is solved.The graph attention network is added to the coding layer,and the importance of adjacent characters is evaluated using graph attention to enhance the model’s ability to learn textual contextual relationships and ensure that the semantic-grammatical features in the text data are preserved to improve the learning effect of the network model.Finally,for the problem of many medical terms and specialization of expressions,this paper proposes a pre-training-based network model based on sequence annotation.With the migration learning-like capability of the pre-trained model and the powerful text representation capability,the model performance is improved by pre-learning the corpus knowledge.In which community website data such as medical encyclopedia,medical answers and electronic medical records are collected and fed into the base model with BERT as the framework after data cleaning to force the model to learn medical knowledge,and then the trained new model is paired with a bidirectional long-and short-term memory neural network.In this whole network architecture,the new model acts as a word embedding layer to dynamically generate medically relevant text features and accomplish medical entity recognition with the joint action of the bi-directional long-and short-term memory neural network storing memory context sequences.Results:The named entity recognition method with enhanced word information and contextual features obtained 85.57%,83.97%,and 83.52% for the evaluation metrics F1 in the c EHRNER,c Med QANER,and YIDU-S4 K datasets,respectively,an improvement of 0.85% compared to the BERT model c Med QANER dataset,and an MC-BERT medical model compared to improved by 0.51%.In the medical-based pre-trained model approach,the evaluation metrics F1 on the YIDU-S4 K and CHIP2020 datasets obtained 87.63% and 85.62%,respectively,and the standard BERT model obtained 77.08%,79.54%,respectively,on this network,compared to the models studied in this paper,which improved by 10.5% and 6.1%,respectively.The YIDU-S4 K dataset improved by 4.11% compared to the method in the previous chapter,and by 2% compared to the best F1 value of 85.62%in the academic review [1].In addition,c EHRNER,c Med QANER,and YIDU-S4 K obtained 1.43%,0.79%,and 0.76% improvement,respectively,when the medical language model was used for the first entity recognition method,indicating the good performance of the medical language model.Conclusion: The two entity recognition methods proposed in this paper,one focuses on retaining text features and the other focuses on learning medical domain knowledge,and the experimental results have verified their robustness and effectiveness,providing a strength for the research of Chinese electronic medical record named entity recognition,but there are problems in these models such as slow processing speed and word vector representation of word sets are more applicable to the general domain,etc.In the future,we will continue to research for these problems,and at the same time,it is also the future research direction and research goal to implement Chinese electronic medical record named entity recognition,build medical knowledge map,design intelligent medical service system,and make medical application really to the real life.
Keywords/Search Tags:named entity recognition, word vector, BERT model, Graph Attention Networks, Chinese electronic medical recor
PDF Full Text Request
Related items