Font Size: a A A

Analysis And Research Of Long Text Summary Generation Based On Deep Learning

Posted on:2024-07-03Degree:MasterType:Thesis
Country:ChinaCandidate:J L ZengFull Text:PDF
GTID:2568307100489374Subject:Electronic information
Abstract/Summary:PDF Full Text Request
We are now living in an era of data explosion and exponential growth of data on the Web,which has led to the problem of information overload,making it even more important to extract key information from these massive amounts of data.Automatic text summarization is a method of compressing text using a machine to produce a short overview to summarize the entire text,and this technique has become one of the popular techniques in natural language processing.In recent years,automatic text summarization has been prominent in the field of short texts and has received a lot of attention.However,due to various reasons such as machine performance and model complexity,summarizing long texts still suffers from various problems such as information redundancy,semantic discrepancies,and missing key information.Therefore,in this thesis,a two-stage long text summarization model is proposed to improve the problems in long text summarization.The main work is as follows:First,a long text compression model based on Text Rank with multi-feature fusion is proposed.The problem of one-sidedness of sentence scoring is improved by combining text features with Text Rank algorithm,and the problem of too much redundant information in sentences is improved by MMR algorithm.Firstly,according to the text feature calculation formula proposed in the thesis,we can calculate the scores of sentences in three aspects: position,words in a sentence and sentence length.Through these calculations,we can quantify the importance and contribution of sentences in the text.Then the text is pre-processed using the BERT model to convert each sentence into a sentence vector,and the cosine similarity is used to calculate the similarity between sentences.Based on the comprehensive consideration of each index,we will combine and weight each sentence to arrive at the final sentence score.Finally,the MMR algorithm is introduced for sentence extraction for redundancy control to get the set of candidate sentences.Second,a text summarization model with Bert Sum fused with Match Sum is proposed.First,the pre-training model Bert Sum is used to further refine the set of candidate sentences to obtain the set of key sentences to obtain the set of candidate summaries with more key contents,and then Match Sum is used to match the semantic space between the candidate summaries and the source documents to obtain the candidate summary ranking,and finally the best summary is obtained.Through a series of experiments,this paper verifies the feasibility of the aforementioned two-stage long text summarization model in processing long text datasets.The model shows some improvement in various metrics of ROUGE,which has important research implications for the field of automatic text summarization generation.In addition,the model provides a new research idea for automatic text summarization generation,which provides reference and reference for future related research.
Keywords/Search Tags:text summary, TextRank, MMR, BertSum, MatchSum
PDF Full Text Request
Related items