Font Size: a A A

Research On Extractive Summarization Of Scientific And Technological Information Text Based On Deep Learning

Posted on:2022-07-15Degree:MasterType:Thesis
Country:ChinaCandidate:J W HanFull Text:PDF
GTID:2518306572991349Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The advent of the era of big data has enriched the production methods and access channels of scientific and technological information texts.The number of scientific and technological information texts has increased exponentially.Automatic text summarization technology uses computers to compress scientific and technological information texts into short abstracts.The efficiency of science and technology researchers have greatly improved by reading abstracts.Traditional extractive summarization methods have problems such as limited extraction of the semantic features of the text,the deviation of the summary from the source text theme,and the difficulty of determining the threshold of the output result,which leads to the low quality of the generated summary.In response to these problems,decomposes extractive summarization into two stages: text abstract candidate set generation and best abstract summary.The first stage: the generation stage of the text candidate summary set,an optimized semantic enhancement graph model is designed.The model is composed of optimized Text Rank graphs and heterogeneous graphs.By expanding the semantic features of scientific and technological information texts,the neural network is difficult to converge after inputting long texts.The problem of generating high-quality text candidate abstract sets.The second stage: the summary selection stage,research the best summary selection method based on the BERT model,convert the problem of sentence abstract extraction into a problem of semantic space text matching,use the text candidate summary set and the source document to do semantic matching,and select the best summary of the text level most in line with the global semantics.The model is experimentally verified on multiple public data sets such as CNN/Daily Mail.The extracted results ensure the fluency and logical coherence of reading,and are higher than other classic summarization methods in recall rate and topic fit.The model also optimizes the calculation performance to a certain extent.The model can also significantly improve the quality of the summary for long text input.
Keywords/Search Tags:extractive summarization, optimized TextRank graph, heterogeneous graph, BERT model
PDF Full Text Request
Related items