Font Size: a A A

Research On Vietnamese-Chinese Low-resource Cross-language Summarization Method Based On Word-level Key Information Guidanc

Posted on:2023-01-29Degree:MasterType:Thesis
Country:ChinaCandidate:X M LiFull Text:PDF
GTID:2555306797473014Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Under the influence of the rapid development of the Internet and the One Belt One Road policy,China has close contacts with other countries such as Vietnam.Under the current network conditions,a large amount of public opinion information is generated every day.The cross-lingual generalization of Vietnamese and Chinese is a more effective method to grasp and more effectively read public opinion information in a timely manner.There are still some problems to be solved in the Vietnamese-Chinese cross-lingual summarization task.For cross-lingual summarization,the difficulty of semantic alignment is the most prominent problem,which make the result of cross-lingual summarization unclear in the description of facts and the expression of topics.To deal with these problems,this paper finds that bilingual word-level information can well guide the generation of abstracts,that is,the probability mapping information of local information such as keywords and global information such as the topic association graphs is very important for the generation of abstracts.Based on this idea,we propose a Vietnamese-Chinese low-resource cross-lingual summarization method guided by wordlevel key information to improve the quality of summarization.This paper mainly completes the following research work:(1)Low Resource Cross-lingual Summarization of Vietnamese-Chinese combined with Keyword Probability MappingThe lack of Vietnamese-Chinese cross-lingual summarization data leads to poor performance of the summarization model,and it also causes the problem of content factual errors in the summaries generated by the Vietnamese-Chinese cross-lingual summarization model.However,the discrete keyword information in the text can characterize the factual information of the text,so we use the keyword probability mapping information to enhance the factuality of the abstract.To this end,this paper proposes a cross-lingual summarization method of Vietnamese-Chinese low-resources combined with keyword probability mapping.Firstly,the source language keywords are used to extract important information,and the source language keywords are mapped to the target language through probability mapping pairs.Finally,the keywords mapped to the target language are integrated into the abstract generation process based on the pointer network.The experimental results on the constructed Vietnamese-Chinese cross-lingual summarization dataset show that this method can effectively enhance the factuality of summaries compared with traditional sequence to sequence models.(2)Low Resource Cross-lingual Summarization of Vietnamese-Chinese fused with Topic Association GraphsThe Vietnamese and Chinese texts are highly consistent in topic structure,and the existing cross-lingual summarization models will generate abstracts that are inconsistent with the topic of the original text.Structural information such as the topic association graph can condense the subject information of the text from a global perspective well.To this end,this paper proposes a cross-lingual summarization method of VietnameseChinese low-resources with topic association graphs.Firstly,the source language text is used to obtain topic words,and the topic words are mapped based on the VietnameseChinese probability mapping to construct a topic association graph.Then,the graph encoder and sequence encoder are used to generate representations based on dual encoders.Finally,the decoding side pays attention to both topic association graph representation and neural network generated distribution to generate summaries.Experiments on the constructed Vietnamese-Chinese cross-lingual summarization dataset show that the topic consistency of summaries can be effectively improved by introducing a topic association graph with global structural information.(3)Low Resource Cross-lingual Summarization prototype system of VietnameseChineseCombining with the cross-lingual summarization methods fused with keyword probability mapping and fused with topic association graph,we design and implement a cross-lingual summarization system with low-resource Vietnamese-Chinese.The system combines search engines,cross-lingual summarization algorithms and so on.Finally,the Vietnamese-Chinese news list,retrieval,detailed viewing of news information,and crosslingual summarization interaction are displayed with interfaced form.
Keywords/Search Tags:Vietnamese-Chinese Cross-lingual Summarization, Word-level Key Information, Probability Mapping, Keywords, Topic Association Graph
PDF Full Text Request
Related items