Font Size: a A A

Research On Unsupervised Named Entity Recognition Based On Cross-lingual Transfer

Posted on:2020-04-25Degree:MasterType:Thesis
Country:ChinaCandidate:H Q WuFull Text:PDF
GTID:2428330590473221Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Named entities are important semantic information in natural language texts.The recognition and classification of named entities are important research contents in natural language processing.With the wide application of deep learning in natural language processing tasks,the neural network based named entity recognition model has achieved good results.The success of neural network based named entity recognition model usually relies on large-scale annotated data.For the languages with a large number of people used and rich in corpus resources,it may be easy to obtain the corresponding manual label data.However,for most low-resource languages,the amount of manual annotation data used to name entity recognition tasks tends to be small,or even non-existent,and manual labeling on these languages is more difficult.In view of lacking of named entity recognition data in low-resource languages,this paper focuses on how to make full use of the manual annotation data of named entity recognition tasks in resource-rich languages,and migrate it to low-resource languages through cross-lingual transfer.This paper focuses on the unsupervised application scenario in the named entity recognition task of low-resource language,that means,there is no labeled data for named entity recognition task in the language.Around this application scenario,this paper mainly includes the following three aspects.(1)Research on projection-based cross-lingual transfer method for unsupervised named entity recognition.For the case of low-resource language without any annotation data,this paper uses two different cross-lingual projection annotation methods to transfer manual annotation data on high-resource language to low-resource language.In this process,bilingual word alignment and cross-lingual word embedding will be used to achieve the projection of annotations between language.In addition,the attention mechanism will be added to the existing end-to-end named entity recognition framework to better model the dependence of words in the sentences to be labeled.(2)Research on unsupervised named entity recognition method based on crosslingual transfer learning.Aiming at the noise problem of the annotation data obtained during the cross-lingual projection process,this paper proposes a cross-lingual transfer model for the named entity recognition task to replace the rule-based mapping method in the existing cross-lingual projection.In order to better apply the proposed cross-lingual transfer model to the named entity recognition task of the low-resource language,this paper adopts the posterior regular framework to use the cross-lingual transfer model as the regular regularization of the low-resource language named entity recognition model,and establish a joint training framework for these two models.This paper iteratively trains the cross-lingual migration model and the low-resource named entity recognition model based on the expectation maximization algorithm.Thereby obtaining a more accurate low-resource language named entity recognition model.(3)Research on cross-lingual unsupervised named entity recognition based on bilingual data.Aiming at the small scale of pseudo-labeled data on low-resource language,this paper proposes a low-resource language named entity recognition method that introduces external bilingual data.In this method,in order to reduce the noise caused by the introduction of unsupervised bilingual data,this paper proposes a bilingual pseudo-labeling data filtering method based on word alignment.In addition,this paper also introduces filtered bilingual pseudo-annotation data into the joint training framework to further enhance the performance of the named entity recognition model in low-resource language.
Keywords/Search Tags:Low resource language named entity recognition, Unsupervised learning, Cross-lingual transfer
PDF Full Text Request
Related items