Font Size: a A A

Research On Chinese Idiom Representation Learning And Its Application

Posted on:2022-08-09Degree:MasterType:Thesis
Country:ChinaCandidate:X LiFull Text:PDF
GTID:2505306563976089Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
With the rise of deep learning,natural language processing is developing rapidly in the Chinese field,where text representation is an indispensable basic encoding layer.Idioms are frequently used in written and spoken language.They play a very important role in Chinese ideology and their status is irreplaceable.Therefore,efficient idiom representation is crucial to the further development of Chinese natural language processing.Idioms are a unique language phenomenon in Chinese.Its fixed four-character structure,simple form and rich content bring two major characteristics: noncompositionality and meaning integrity,that is,its meaning cannot be simply added by the meaning of the characters,but as a whole.These two characteristics make the current mainstream word-level and character-level representation methods not suitable for direct application to Chinese idioms.In order to effectively represent idioms,we propose a multi-granularity representation model for Chinese idioms based on the definition-augmented embedding,and a cloze-style Chinese reading comprehension task to verify the representation effect.Finally,it is applied to the college entrance examination Chinese idiom test questions,and achieves good results.The contributions of this paper are as follows:1)We propose two representation models.1)Contextual representation model based on the mixed embedding of characters and words.In order to achieve the perfect integration of characters and words,we design two word vector alignment methods to solve the alignment problem and three fusion methods to model the interaction between characters and words.2)Idiom representation model based on the definition-augmented embedding.In order to complete the effective screening of the different components in the definition,a unique attention mechanism is designed to solve the two problems that the word vector cannot effectively represent the idiom and the character information will cause the confusion of the word information.Experiments on real Chinese machine reading comprehension tasks show that the model in this paper can improve the performance of the current mainstream BiLSTM,AR and SAR reading comprehension models by up to 9.5%,which proves the effectiveness and versatility of our method.2)Through the quantitative analysis of specific cases,we find that the Euclidean distance between the representations of similar idioms obtained by the above model is larger and the cosine similarity is smaller,which proves that the idiom representation model proposed in this paper has stronger ability to distinguish similar idioms than the baseline model,which means our model is s a general characterization model with a wide range of application value.3)We collect data and establishes a data set of test questions related to idioms in the college entrance examination Chinese test paper,and applies the above model to solve the questions.The experimental results show that the model proposed in this paper can solve the college entrance examination idiom questions very well.The accuracy rate in the test set is 75.9%,which is much higher than the average level of 66.7% of the human.10 figures,19 tables,and 53 reference articles are contained in the dissertation.
Keywords/Search Tags:Natural language processing, Idiom representation, Chinese machine reading comprehension
PDF Full Text Request
Related items