Font Size: a A A

Research On Text Summary Generation Algorithm Based On Strict Format Control

Posted on:2024-04-02Degree:MasterType:Thesis
Country:ChinaCandidate:W J QuFull Text:PDF
GTID:2568307163462894Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the improvement of the ability of pre-training language model,the text summary technology used to solve the problem of information redundancy has also made great progress.At present,the research on controllable summary mainly focuses on generating text summary with controllable length for long text.Considering that the generation of abstract in specific text format can better meet people’s needs in some specific situations,this paper proposes an abstract generation model to solve the problem of text summary generation with controllable format.Taking Song Ci as an example,the model can generate text summaries of news data sets according to the text format of Song Ci.The framework proposed in this paper is divided into two steps:(1)a more efficient abstraction model BARTSUM is proposed to realize text abstract extraction;(2)On the basis of BARTSUM combined with the generative method,the format-sum model is proposed to realize the Format controllable text summary generation.The specific contents of this paper include:(1)an extraction abstract model BARTSUM is proposed.BART,a variant of pre-training language model BERT,is used as the basic model,and Transformer and LSTM are combined to construct the model.(2)A Format controllable abstract model,format-SUM,is proposed.Taking Song Ci as an example,the model aims to generate text summaries in Song Ci format from news data sets.Format-sum model adopts the generative summary method and takes BARTSUM model as the Encoder part of the generative summary.The embedding layer of the pre-training language model at the Decoder end is modified to record the Format information of the training set.Finally,the fine-tuning experiment is carried out,so as to realize the generation of text summary with controllable format.Experimental results show that BARTSUM model combined with Transformer has the best effect,and its Rouge evaluation on CNN/Daily Mail dataset is improved compared with previous summarization models.At the same time,the Format-Sum model realizes the generation of text summarization in line with the Format of Song lyrics for Xinhuanet news dataset.This study provides a new idea and direction for the expansion and application of controllable summarization generation,and provides an important reference and reference for future related research.
Keywords/Search Tags:Text Summarization, Format Control, Pre-trained Language Model, Format-Sum
PDF Full Text Request
Related items