Research On Multi-Document Summarization Method With Text Association

Posted on:2024-02-06

Degree:Master

Type:Thesis

Country:China

Candidate:Q Zhong

Full Text:PDF

GTID:2568306941963699

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Text summarization is a core problem in natural language processing.With the rapid development of the Internet,the demand for quickly obtaining the target information from massive data is increasing,as well as the application of multi-document summarization.Multi-document summarization refers to analyzing,refining,integrating and processing documents related to multiple topics,and generating a summary that can summarize the main content of all documents.However,some existing research works on multi-document summarization often simply concatenate multiple source documents into a long sequence,and model the multi-document summarization task as a long sequence-to-sequence task without considering document-level associations relation.At the same time,the extremely long input length of multiple documents usually exceeds the limitation of encoders,in this situation,if the truncation operation is adopted,key information will be lost easily,resulting in the waste of the context information in the document set.In addition,some research works also learn cross-document relations in multiple documents by mining co-occurrence words or entities,but due to the diversity of language expressions,it is difficult for these cooccurrence words or entities to capture the implicit connection between documents.Therefore,this thesis mainly focuses on how to effectively utilize the relationship between documents to improve the accuracy and comprehensiveness of multi-document summarization.The specific research content includes the following three aspects:First of all,for the problem that the traditional multi-document summarization research work ignores the document-level association relationship,this thesis proposes a multidocument summarization method based on the association discriminant model.This method first combines the siamese network and the pre-trained language model BERT to construct a twin-tower model for association discrimination;After that,the association discrimination model will be used to obtain the representation of each pair of two sentences,and splice the obtained sentence representation.Then the model will judge the semantic relationship between any two sentences from three different perspectives:whether they are the same topic,whether they have the same source text,and whether they are preceding and following sentences.The parameters in the summary model will be updated through the learning of the association discriminant model;Finally,use the summary model to select sentences that can better represent the main content of a collection of documents and organize them into summaries.Experimental results show that compared with traditional multi-document extractive summarization methods,this method obtains a large improvement in ROUGE evaluation criteria.Secondly,to utilize different dimensional associations among multiple documents,this thesis proposes a multi-document summarization method based on multi-dimensional association construction.This method first divides the document set into semantic nodes of three different dimensions:topic,source document,and sentence,and uses the pre-trained model BERT to encode the nodes of different dimensions;Then,the multi-dimensional multi-document association graph is constructed according to the multi-level relationship between document level and sentence level nodes,after that the graph convolutional neural network will be used to capture the cross-document relationship in the document set from different aspects;Finally,the integration of various document association graphs from different dimensions,will be used to guide the extraction of summaries process.Experimental results show that this method can make a fully usage of the relationship between multi-dimensional documents,and the model performance is obviously better than other baseline methods.Finally,for the problem that context information is easily lost after truncation during encoding long texts,this thesis proposes a multi-document summarization method combined with reference relations.This method first analyzes the referential relationship between sentences in each source document,uses the graph attention network to capture the referential relationship between sentences,and extracts candidate content from each source document;Then we add segment embeddings and source embeddings into the embedding layer of BERT pre-trained model,designed to learn the hierarchical relationship between documents in the encoding layer,solve the input problem of multiple sentences in the document set,and obtain the vector representation of each sentence in the document set more accurately;Finally,the extracted candidate content is connected and input into the modified BERT model to further judge the importance of sentences,and select the first few sentences with the highest importance to form a summary.Experimental results show that this method can effectively filter important information from the original document set and improve the performance of multi-document summarization tasks.

Keywords/Search Tags:

Multi-document Summarization, Text Association, Pre-training Model, Graph Neural Network

PDF Full Text Request

Related items

1	Research On Deep Neural Networks Based Automatic Text Summarization
2	Multi-Document Automatic Summarization Based On The Term-Sentencesâ€”Document Tri-layer Graph Model
3	The Research On Multi-document Summarization Generation Method Based On Text Relation Graph
4	Optimization Of Abstract Sentence Combination Based On Pre-training Model
5	Research Of Hybrid Text Summarization User Dynamic Interest Model Technology Based On Deep Learning
6	Chinese Text Summarization Technology Based On Improved BERT Pre-training Model And Graph Neural Network
7	Research On Multi-document Summarization Models With Graph Structured Semantics Representation And Redundancy Control Mechanism
8	Research On Text Classification And Automatic Summarization Based On Distributed Representation
9	Research On Graph Based Models For Multi-document Summarization
10	Multi-Document Summarization Generation And Application Based On Domain Knowledge Graph