Font Size: a A A

Named Entity Recognition With Multi-Grained Representation Learning

Posted on:2023-08-25Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y WangFull Text:PDF
GTID:1528307136499194Subject:Information networks
Abstract/Summary:PDF Full Text Request
With the rapid development of the big data area,natural language text,as one of the important information carriers,frequently appear in people’s daily life.Text is an important channel for people to know and understand the world,and it also facilitates people to express opinions and record history more effectively.In recent years,the rapid development of artificial intelligence technology has continuously improved the ability of machines to analyze and understand natural language texts.In order to realize the intelligent processing of text,the construction of basic tasks of natural language processing and its performance are particularly critical.Many basic tasks such as named entity recognition,part-of-speech tagging,and syntactic analysis emerge as the times require.In particular,named entity recognition,which aims to identify all entity words in the sentence and predict their types,is a key foundation task for high-level natural language processing tasks such as knowledge graph construction,intelligent question answering systems,and machine translation.This paper conducts an in-depth investigation and research on the named entity recognition task.Combined with multi-level semantic representations in natural language texts(including token-level,region-level,and sentence-level),this paper researches several key issues in the named entity recognition tasks by introducing transfer learning,attention mechanism,and multi-label classification learning in the field of deep learning,and achieves some results.The main contributions of this paper are as follows:(1)Aiming at the problem of implicit feature extraction in named entity recognition models,this paper proposes a named entity recognition method based on adversarial transfer learning and adaptive multi-representation fusion.This method is based on the classic sequence labeling model,and realizes the knowledge transfer of multi-task token-level representation by introducing the transfer learning,and especially adopts the adversarial training strategy to avoid noise interference in the process of knowledge transfer.In addition,the method also introduces the attention mechanism to achieve task-specific adaptive representation fusion.The method proposed in this paper incorporates multiple sequence tagging tasks(e.g.,named entity recognition,part-of-speech tagging,and sentence segmentation)to effectively improve the comprehensiveness of token-level feature extraction.(2)Aiming at the challenge of the complex nested structure of named entities,this paper proposes a named entity recognition method based on boundary head-tail detection and token interaction tagger.This work designs a region-level classification method with the modeling strategy derived from two key properties pertaining to named entities,including(i)explicit boundary tokens and(ii)tight internal connection within the boundary.The method adopts the head-tail detector based on self-attention mechanism and bi-affine classifier to learn the boundary information of entities,and the token interaction based on sequence labeling model to learn the internal connection within the boundary.The method proposed in this paper strengthens the region-level representation from the perspectives of entity boundary tokens and tokens within the boundary,thereby improving the disparity of representations of similar regions in sentences.(3)Aiming at the diversity of named entity categories,this paper proposes a named entity recognition method based on sentence-level multi-label classification and beam search algorithm.This work successfully applies the sequence-to-sequence model to the named entity recognition task.Since the entity category depends on both the entity itself and the context in the sentence,this work customizes a category-oriented sentence-level multi-label classification module between the encoder and decoder to further learn the entity category information in the sentence.The method also designs the restricted beam search algorithm to ensure the diversity and validity of the output of the sequence-to-sequence model in the testing phase.(4)Aiming at the conditional error propagation problem in the named entity recognition model,this paper proposes a named entity recognition method based on parallel processing of entity boundary and category information.This work builds corresponding learning modules for entity boundary information and category information,and adopts a parallel processing strategy to overcome the limitation of traditional methods that need to learn entity-related information in a certain order.In addition,the method leverages a bi-affine classifier-based matching module to predict the correlation between the region-level boundary representations and sentence-level category representations learned by the above modules.
Keywords/Search Tags:Named Entity Recognition, Boundary Detection, Adversarial Transfer Learning, Attention Mechanism, Multi-label Classification
PDF Full Text Request
Related items