Font Size: a A A

Research On Multi-document Summarization Models With Graph Structured Semantics Representation And Redundancy Control Mechanism

Posted on:2022-10-30Degree:MasterType:Thesis
Country:ChinaCandidate:R F PanFull Text:PDF
GTID:2568306323977399Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In the era of exponential growth of text data,it is of great significance to understand large documents and obtain valuable information from them.The Chinese multi-document summarization task aims to generate a coherent,non-redundant,grammatically readable summary from a cluster of Chinese documents related to a topic.At present,domestic research on Chinese multi-document summarization is relatively few,and there is no public Chinese multi-document summarization data set available.At present,most of the solutions for Chinese multi-document summarization tasks are to process multiple documents into a single document for summarization tasks,or to select abstract sentences in an extractive manner.However,these methods generally have the problems that they cannot effectively detect significant text information and cannot reason about the semantic relationship of text between documents,and the summary generated by abstract methods usually have high redundancy problems.Therefore,this thesis aims to construct a Chinese multi-document summarization data set,and on the basis of this data set,study how the generative Chinese multi-document summarization method can effectively improve the detection ability of significant text information,how to reason about the semantic relationship between document text,and how to avoid the problem of redundancy when generating summary.The specific research ideas are divided into two types:one is to combine the graph attention neural network coding method based on the encoder-decoder structure model,and the displayed graph structure data encoding operation helps the summary model to capture the source document Salient information and reasoning about the semantic relationship between texts;the second is based on the first research idea,in order to solve the redundancy problem in the generation of abstracts and assist the enhancement model to identify salient text information in the source document,introduce the maximum marginal relevance algorithm is used to study the decoding side.Finally,after experimental analysis,these two methods have a significant performance improvement on the Chinese multi-document summarization test set we constructed compared with some previous excellent summarization models.In addition,through the comparative analysis of the generated abstracts,combined with the graph neural network method,the ability of the model to detect significant information and semantic relation reasoning is effectively improved,and thanks to the introduction of the maximum marginal relevance algorithm,it effectively avoids the redundancy problem when generation of Chinese abstracts.
Keywords/Search Tags:Multi-document Summarization, Abatractive Method, Graph Attention Nerual Network, Maximal Marginal Relevance Algorithm
PDF Full Text Request
Related items