Analysis And Research Of Long Text Summary Generation Based On Deep Learning

Posted on:2024-07-03

Degree:Master

Type:Thesis

Country:China

Candidate:J L Zeng

Full Text:PDF

GTID:2568307100489374

Subject:Electronic information

Abstract/Summary:

PDF Full Text Request

We are now living in an era of data explosion and exponential growth of data on the Web,which has led to the problem of information overload,making it even more important to extract key information from these massive amounts of data.Automatic text summarization is a method of compressing text using a machine to produce a short overview to summarize the entire text,and this technique has become one of the popular techniques in natural language processing.In recent years,automatic text summarization has been prominent in the field of short texts and has received a lot of attention.However,due to various reasons such as machine performance and model complexity,summarizing long texts still suffers from various problems such as information redundancy,semantic discrepancies,and missing key information.Therefore,in this thesis,a two-stage long text summarization model is proposed to improve the problems in long text summarization.The main work is as follows:First,a long text compression model based on Text Rank with multi-feature fusion is proposed.The problem of one-sidedness of sentence scoring is improved by combining text features with Text Rank algorithm,and the problem of too much redundant information in sentences is improved by MMR algorithm.Firstly,according to the text feature calculation formula proposed in the thesis,we can calculate the scores of sentences in three aspects: position,words in a sentence and sentence length.Through these calculations,we can quantify the importance and contribution of sentences in the text.Then the text is pre-processed using the BERT model to convert each sentence into a sentence vector,and the cosine similarity is used to calculate the similarity between sentences.Based on the comprehensive consideration of each index,we will combine and weight each sentence to arrive at the final sentence score.Finally,the MMR algorithm is introduced for sentence extraction for redundancy control to get the set of candidate sentences.Second,a text summarization model with Bert Sum fused with Match Sum is proposed.First,the pre-training model Bert Sum is used to further refine the set of candidate sentences to obtain the set of key sentences to obtain the set of candidate summaries with more key contents,and then Match Sum is used to match the semantic space between the candidate summaries and the source documents to obtain the candidate summary ranking,and finally the best summary is obtained.Through a series of experiments,this paper verifies the feasibility of the aforementioned two-stage long text summarization model in processing long text datasets.The model shows some improvement in various metrics of ROUGE,which has important research implications for the field of automatic text summarization generation.In addition,the model provides a new research idea for automatic text summarization generation,which provides reference and reference for future related research.

Keywords/Search Tags:

text summary, TextRank, MMR, BertSum, MatchSum

PDF Full Text Request

Related items

1	Research On Chinese Text Summary Extraction Algorithm Based On TextRank
2	Research And Application Of Text Summarization Technology
3	Research On The Method Of Extracting The Tag From Chinese Compositions Of Primary School Based On Text Automatic Abstract
4	TMSA:A Two-stage Autimatic Summary Generation Model
5	Automatic Summary Extraction Based On TF-IDF And TextRank
6	Research And Application Of Abstract Method Of Chinese Web Text Based On Seq2Seq Framework
7	Chinese Single Document Abstract Research Based On Doc2Vec And Improved TextRank
8	An Emotion Summarization Method Based On Semantic And Affective Relations In Chinese Micro-blog
9	Extractive Abstracts Of Long Chinese Patent Texts Based On Improved Bertsum Model
10	Research And Implementation Of Key Technology For Intelligent Generation Of Conference Summary