Font Size: a A A

Research On Text Summarization Generation Based On Pre-training Language Model

Posted on:2024-04-20Degree:MasterType:Thesis
Country:ChinaCandidate:J Z SunFull Text:PDF
GTID:2568307079459234Subject:Surveying the science and technology
Abstract/Summary:PDF Full Text Request
In the current society,the amount of textual information is growing exponentially,and people received not only a huge amount of information,but also problems such as complicated and unclear expressions,confusing and reversed semantics,which lead people to spend a lot of time to search for the information they really need,and several times more energy to identify the accuracy of the information.With the help of machine learning and deep learning,text summarization technology can quickly transform complicated information into a short summary.However,current text summarization techniques still suffer from factual errors and insufficient summary accuracy.In this thesis,we investigate and propose solutions to these two problems separately,one is the study of generative text summarization incorporating chapter information,and the other is the study of word vector reconstruction-based text understanding enhancement methods:(1)To address the problem of factual errors in generative text summaries,this paper proposes the method of fusing chapter information.In generating the summary,we believe that the words in the chapter should have a higher probability of being generated,so we adjust the probability of generating the original words upward through translation and scaling in the decoding stage,making the summary results finer in granularity and higher in accuracy,and optimizing the factual error problem of the generative text summarization with a low number of parameters.Since there is no objective evaluation index for the factual error problem,in order to objectively evaluate the model effect,500 data with factual errors are selected to build a test set,and the model effect is evaluated by comparing the error correction rate of this method and other methods.Finally,the error correction rate of factual errors reached 29.2% while the number of participants was improved by 3.17% by the proposed method.Compared with the knowledge graph-based method and the BART-based error correction model,the present method increases the number of parameters by only about 8% of them.In the performance of ROUGE-1,ROUGE-2 and ROUGE-L scores,the proposed method improves 0.33,0.34 and 0.28,respectively,compared with the benchmark model T5 Pegasus.in addition,this thesis also optimizes the lexicon usage of the pre-trained model,which shortens the training time by 33.7% while guaranteeing the accuracy rate.(2)To address the problem of insufficient accuracy of generative text summarization,this thesis proposes a text understanding enhancement method based on word vector reconstruction.In this thesis,the semantic information of sentences is explicitly fused into word vectors by means of Transformer blocks combined with CNN.Compared with the traditional attention mechanism of implicit fusion,the proposed method can generate better quality word vectors,and the ROUGE-1,ROUGE-2 and ROUGE-L values are improved respectively on the basis of the baseline model T5 Pegasus by 0.19,0.13 and0.16 respectively,which improves the accuracy of generative text summarization.In addition,experiments are designed for different lengths of microblog texts to verify that the method improves over the baseline model for different data amounts.
Keywords/Search Tags:Generative text summarization, Fusion of textual information, Word vector reconstruction
PDF Full Text Request
Related items