Research On Two-stage Long Text Summarization Model Based On BART And Hierarchical Encoding

Posted on:2023-03-05

Degree:Master

Type:Thesis

Country:China

Candidate:Y X Chen

Full Text:PDF

GTID:2568307046493564

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of the Internet,text data on the Internet is exploding.It is a great challenge for us to acquire valuable information quickly.Automatic text summarization aims at using computer technology to compress and summarize the main content of text automatically,which can improve the efficiency of people to obtain information.At present,most of researches on automatic text summarization take short texts such as news and reviews as the research content,and these researches have made some progress.However,with the continuous update of application scenarios,the need for summarizing long texts is increasing.The methods of automatic text summarization are divided into two methods:extractive and abstractive,but both of which have great limitations in summarizing long texts.Aiming at this problem,this paper uses a two-stage summarization method to deal with the long text summarization task,which divides the process of summarizing long texts into two stages: key sentence extraction and summary generation.Besides,this paper designs different summarization models to accomplish the goals of the two stages.The main work and contributions are as follows:(1)The pretrained model BART performs well on the short text summarization tasks,but it cannot handle long texts due to the design of its own model structure.BART is studied in this paper,and based on BART,we constructs Long-BART(LBART)by sparse self-attention mechanism and extended positional encoding.LBART can handle longer texts,and it is suitable for the long text summarization task.In addition,this paper proposes a variety of strategies to reconstruct the training data,which can avoid the low utilization of the training data caused by the length limitation of the model and effectively enhance the training effect of the model.(2)This paper propose a hierarchical encoding-based key sentence extractor named Hierarchical Extractor(Hi Ext),which is used to extract key sentences from long texts and guide the generator to generate higher quality summaries.Existing extractive summarization models often ignore the hierarchical structure of long texts.Therefore,when designing the extractor,we use the idea of hierarchical encoding to fully mine the rich hierarchical information in long texts.First,we use the hierarchical encoder to obtain the encoding information of sentences,sections and documents in long texts,and then the attention mechanism is used to fuse these encoding information to improve extraction effect.(3)This paper combine the extractor and the generator to form a two-stage summarization model,and in order to demonstrate the superiority of this model,we compare it with ten other summarization models on two public datasets,Pubmed and Arxiv.According to the results of the ROUGE evaluation system,on the Arxiv dataset,the ROUGE-1,ROUGE-2,and ROUGE-L scores of our model reached 48.016,20.116,and 42.593,respectively,surpassing all comparison models;On the Arxiv dataset,the ROUGE-1,ROUGE-2,and ROUGE-L scores of our model reach 47.982,20.863,and42.315,respectively,where it ranks first in ROUGE-1 score and second in both ROUGE-2 and ROUGE-3 scores among all the compared models.

Keywords/Search Tags:

automatic text summarization, long text, two-stage summarization model, BART, hierarchical encoding

PDF Full Text Request

Related items

1	A Research On Text Summarization Model Based On BART
2	Research On Content Semantic Analysis Based Text Summarization Methods
3	Research Of Automatic Summarization Oriented To News Text
4	Research On Automatic Text Summarization In Chinese
5	Research On Automatic Text Summarization Algorithm For Chinese Long Text
6	Research On Deep Neural Networks Based Automatic Text Summarization
7	Research Of Automatic Text Summarization Based On Selective Encoding Model
8	Research On Key Techniques Of Two Phase Automatic Summarization Algorithm For Long Text
9	Research And Implementation Of Automatic Text Summarization Technology Based On Deep Learnin
10	Research On Automatic Text Summarization Algorithm For Chinese And English Long Text