Research On Muti-document Summarrization Based On Key Sentences And Fact Enhancement

Posted on:2022-05-25

Degree:Master

Type:Thesis

Country:China

Candidate:J B Shi

Full Text:PDF

GTID:2558306914478744

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

Although neural network-based sequence learning methods have made significant progress in single-document summarization,they produce unsatisfactory results on multi-document summarization.There are three main challenges:1)Multi-document summarization involves a larger search space and more limited training data,which sets up barriers for neural methods to learn enough representations;2)Multiple documents need to resolve higher levels between source files.The information is redundant,and the single-document summary method is less efficient.3)The neural network-based generative summary model can generate summaries that highly overlap with manual references,but the existing models are not optimized for fact correctness,which is a key indicator in real applications.In response to the above problems,this article has mainly done the following work:First,a method based on self-supervision is proposed to enhance the model’s ability to express "facts".First extract fact information from the original document,and then add noise to the data through methods such as fact masking,fact deletion,sentence reordering,etc.,and restore the original text through self-supervised training to enhance the ability of the model to represent facts.Second,a method based on comparative learning to enhance the model to accurately generate "facts" is proposed.In the model training process,the decoded output is compared with the positive and negative fact knowledge examples to optimize the factual correctness of the generated summary.Third,a general framework for multi-document summary generation that explicitly strengthens the ability of fact representation is proposed.This framework uses key information extraction modules to reduce the redundancy of input documents,so as to more effectively process multiple input documents and generate summary.This is essential for long document summaries,and helps produce a coherent and concise summary.The knowledge extraction module and the pointer decoding module based on fact attention are further designed to explicitly pay attention to the fact knowledge contained in the abstract.Experimental results on CNN/DailyMail and MultiNews data sets show that the proposed architecture brings substantial improvements on several strong benchmarks.Through the analysis of the ablation experiment,the effectiveness of each fact-enhancing method is also proved.

Keywords/Search Tags:

natural language processing, multi-document summarization, neural network, pre-trained language model

PDF Full Text Request

Related items

1	Research On Document-level Relationship Extraction With Reasoning Information
2	Research On Neural Machine Translation Methods Incorporating Pre-trained Language Model Knowledg
3	Research And Implementation Of Text Summarization System Based On Pre-Trained Language Model
4	Research And Application Of Document Semantic Representation Method
5	Submodularity in Natural Language Processing: Algorithms and Applications
6	Research On Abstractive Text Summarization Based On Pre-trained Language Model
7	Scalable Multi-Document Summarization Using Natural Language Processing
8	A Neural Network Based On WXLNet And Multi-Task Lable Embedding For Sentiment Analysis
9	Enhancing Pre-trained Language Models For Machine Reading Comprehension
10	Automatic Summarization Of Multimedia Information And Related Technology Research,