Font Size: a A A

Research And Implementation Of Text Summarization Based On Attention Mechanism

Posted on:2020-01-04Degree:MasterType:Thesis
Country:ChinaCandidate:Y T LiaoFull Text:PDF
GTID:2417330590982854Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
With the advent of big data,information data has exploded,and hundreds of millions of data and information are generated every day in the Internet era.Therefore,human beings are inevitably facing the problem of information overload.Due to the popularity of communication channels such as We-Media,the text information is increasing.How to extract short topics from the cluttered texts,that is,the study of automatic text summarization.It is of great significance for people to quickly and accurately obtain valid information from massive data.A text abstract is a technique that highly summarizes the original information by generating a concise text.To convert long text to short summary content,we need to encode the original input into a semantic vector through the seq2 seq framework and decode the semantic vector to generate the output.In this paper,bidirectional LSTM neural networks is used for the encoding part of the encoder-decoder framework,and the decoding part is unidirectional LSTM neural network.However,long text sequences relying on only one semantic encoding is not enough to represent all text information.Therefore,the attention mechanism is introduced and the model is improved on this basis.The specific improvement is to calculate the importance score of each sentence by using the TextRank algorithm combined with the positional features and novelty of the sentence,and select the TOP-K sentences with the highest score as the input sequence.And give up the traditional practice of reviewing all the text through attention,let attention focus only on local information,and then focus on the alignment position.In this way,noise and calculation time are reduced,improving the accuracy of the digest.Finally design the experiment and analyze the results.Introduced and processed experimental data,then we design two contrast experiments.What's more,word-based and char-based methods were used to construct the word vector expression of the input sequence.The quality of the generated summary was evaluated by two indicators,ROUGE-1 and ROUGE-2,and the feasibility of the model was verified by comparing the experimental results.
Keywords/Search Tags:Textsum, seq2seq, attention, TextRank
PDF Full Text Request
Related items