Font Size: a A A

Research On Medical Named Entity Recognition Based On Multi-feature

Posted on:2024-06-19Degree:MasterType:Thesis
Country:ChinaCandidate:C Y GanFull Text:PDF
GTID:2544306917990539Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of internet technology,the application of computer technology in the medical field is receiving increasing attention from the academic community.As a populous country,China has rich medical resources and generates a large amount of medical text data.How to integrate these medical text data with advanced computer technology to improve the level of informationization in the medical field and enhance the efficiency of medical knowledge retrieval has become a research focus.Therefore,it is natural to apply named entity recognition tasks in natural language processing to the medical field.By utilizing named entity recognition technology to extract medical text entities,medical-related knowledge graphs can be established,thus allowing for faster organization of complex medical knowledge.In order to further improve the accuracy of medical entity recognition,this paper conducts research on named entity recognition methods,focusing on improving the precision of entity recognition in the medical field,shortening the training time of models,and enhancing the precision of nested entity recognition in medicine.The specific research contents are as follows:(1)To address the issue of low accuracy in named entity recognition in the medical field,this paper proposes a new word embedding representation method.First,a simple recurrent neural network is used to train medical texts to obtain character-level vector representations.Then,the GloVe model is used to obtain word-level vector representations of medical texts.Finally,a pre-trained BERT model is used to dynamically generate vector representations of medical texts,and the three types of vector representations are concatenated.Experimental results show that compared to the traditional Word2 vec word embedding representation,the proposed multi-level feature fusion word embedding model improves precision,recall,and F1 scores on the GENIA and NCBI-disease datasets.(3)To address the issue of long training time in named entity recognition models,a new medical named entity recognition model is proposed that combines gated recurrent units with GlobalPointer.Compared to LSTM,gated recurrent networks optimize internal gating structures at the level of individual neurons.Compared to conditional random fields,GlobalPointer has a more global view of context,avoiding the complex recursive computations of conditional random fields.These two improvements overall shorten the training time of the named entity recognition model.On the GENIA and NCBI-disease datasets,the proposed method reduces the model’s training time by 22% to 30%.The impact of ROPE position encoding on the GlobalPointer model is also studied,and experimental results show that adding ROPE position encoding can improve the performance of the GlobalPointer model by an average of around 9.41%(3)To address the difficulty and low accuracy of nested entity recognition in medical named entity recognition,a model based on hierarchical bidirectional gated recurrent unit networks and text convolutional neural networks is proposed.First,a text convolutional neural network is used to obtain local features of the text,to compensate for the inadequate ability of gated recurrent networks to extract local features.Second,gated recurrent networks are used for hierarchical stacking,with each layer corresponding to processing a layer of nested relationships,to improve the recognition accuracy of nested entities.Experimental results show that the proposed model achieves high accuracy on the GENIA,ACE2005,and BC2GM datasets...
Keywords/Search Tags:Medical named entity recognition, multi-feature fusion, GlobalPointer, nested entity, BERT
PDF Full Text Request
Related items