Font Size: a A A

Research On Named Entity Recognition Based On Global Information

Posted on:2024-07-22Degree:MasterType:Thesis
Country:ChinaCandidate:Y JiaFull Text:PDF
GTID:2568307127454144Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Chinese Named Entity Recognition(CNER)is one of the subtasks of information extraction,the purpose is to extract predefined entities from serialized text,such as names,geopolitical entities,job titles or other proper nouns,etc.As the cornerstone of natural language processing,CNER has laid a good foundation for the development of question answering systems and machine translation.In addition,the application of efficient information retrieval and knowledge graph also benefited from the rapid progress of CNER.In recent years,named entity recognition(NER)has been extensively studied,especially the emergence of LatticeLSTM in 2018,which determined that the development direction of CNER is to integrate lexical information into the character-level model,otherwise the way of extracting entities after word segmentation is extremely difficult.It is easy to introduce word segmentation errors.Although the development of NER is becoming more and more mature,it still faces many challenges,such as the blurred word boundaries,global semantic loss and word ambiguity.In order to address the above problems,this thesis has carried out innovation and research on the basis of character-level named entity recognition methods from the perspective of vocabulary enhancement and global semantic disambiguation.This thesis mainly includes the following research work and innovations.(1)This thesis proposes a dual-module interaction method based on the rethinking mechanism.This method can extract higher-level semantic information while introducing rich lexical information,and the effective fusion of the two alleviates the problem of word ambiguity to a large extent.The overall structure of the model can be divided into three parts,including global semantic extraction module(GM),global and local feature extraction module(GLM)and rethinking mechanism.Firstly,the GM module obtains the overall semantics from a global perspective and then integrates the global background information into the process of lexical information enhancement in GLM.In particular,the global information can provide GLM with the same and different information guidance.The subsequent rethinking mechanism will rebalance the lexical information represented by the local word combination and the global semantic information to obtain more reasonable high-level semantic information,and finally use Bi LSTM+CRF for final label prediction.The experimental results indicate that the proposed method can capture the entities in a specific instance,solve the problems of global semantic loss and blurred word boundaries to a certain extent,and has increased these indicators on four benchmark Chinese NER datasets,outperforming the comparison methods.(2)This thesis proposes a model based on multi-aspect feature fusion with mutual attention mechanism,which preprocesses multiple features at the input layer and extracts reasonable global fusion semantic information through the mutual attention mechanism.Specifically,multiple features contains the following three kinds of features,which are internal feature(bi-gram feature + position feature),synonymous feature and offline lexical feature.The bottom layers of these three features are quite different,so they can better represent the features from different perspectives.These types of features are independent of each other and new features can be used to replace old features as long as the feature dimensions are consistent.In the feature fusion stage,abandon the use of complex semantic encoder,but through the simple and effective mutual attention mechanism,the mechanism uses the fusion parameter to balance the character-level features and contextual features to obtain the enhanced fusion features.The fusion features of the last three aspects are connected to each other and sent to the decoding module for final label prediction.Due to the coupling between feature and semantic encoder is low,so this method has certain flexibility and transferability.Experimental results indicate that this method can effectively extract high-level global semantic information,thereby alleviating the problem of word ambiguity to a certain extent,and improving the recognition effect of the model.
Keywords/Search Tags:Named entity recognition, Word enhancement, Global semantic information, Attention mechanism, Multi-feature fusion
PDF Full Text Request
Related items