Font Size: a A A

Research On Methods Of Automatic Text Summarization Based On Information Theory

Posted on:2021-01-24Degree:MasterType:Thesis
Country:ChinaCandidate:N X LaoFull Text:PDF
GTID:2428330611467606Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The essence of text summarization is a process of information extraction and processing.However,at present,the research of automatic text summarization technology is often based on empiricism and pragmatism,lacking of effective analysis,guidance and improvement based on information theory framework.Firstly,based on the Peyrard's information theory framework,this paper aims at solving the limitation of merely considering sentence similarity in the original Text Rank algorithm.A new method is proposed to improve the classical Text Rank algorithm by using the concept of importance in the text summarization information theory framework.Moreover,for the problem that sentence are not basic semantic unit,the concept and calculation formula of sentence's importance are put forward.Then,by simulating the two-stage process of human generating summarization,a two-stage text summarization method based on information theory is proposed.In the first stage,Text Rank algorithm optimized by information theory is used to extract key sentences,then in the second stage,transformer neural network is used to organize and generate creative texts.In the meantime,this paper analyzes and guides the selection of Chinese semantic units by using the information theory framework,and analyzes the common abbreviation phenomenon in Chinese language environment by using the information theory framework,which provides the theoretical support for the hybrid word-character model for Chinese text summarization.Because BERT pre-training language model has made great breakthrough in a series of natural language processing tasks,this paper also explores the application of BERT language model in Chinese text summarization.According to the information compression characteristics of text summarization,this paper proposes to use the whole word masking BERT as the encoder to extract the word level granularity information features,and utilize the multi-layer Transformer neural network as the decoder to generate the final summary in the character level.We apply four language models as theChinese word level information encoder: BERT_base_Chinese,BERT_wwm_Chinese,BERT_wwm_ext_Chinese and Ro BERTa_wwm_ext_Chinese.The experimental results on LCSTS data set show that the performance of Ro BERTa_wwm_ext_Chinese +Transformer exceeds HWC + Transformer.Additionally,as the most subversive paradigm of the next generation of computers,quantum computers have the ability to accelerate many classical algorithms.This paper also proposes a quantum information Text Rank method by imitating the work of quantum Page Rank scheme.It is expected to have practical significance in the coming quantum computing era.Last but not least,despite the above works,there are still many shortcomings in this paper.Therefore,we also analyze the limitations of this paper and propose six promising research directions for future plan.
Keywords/Search Tags:Information Theory, Automatic Text Summarization, TextRank, Neural Network, Language Model, Quantum Algorithm
PDF Full Text Request
Related items