Font Size: a A A

Research And Implementation Of Text Summarization System Based On Pre-Trained Language Model

Posted on:2023-11-24Degree:MasterType:Thesis
Country:ChinaCandidate:W Y GuoFull Text:PDF
GTID:2558306914963499Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The popularity of mobile terminals has changed the main way for humans to read and obtain information.But nowadays the pace of life is accelerating,and people can only use the fragmented time to quickly browse articles and obtain useful information.Text summarization can help readers filter secondary information and obtain the required information efficiently.At present,this technology has made a major breakthrough under the blessing of the powerful language representation capability of the Pre-trained Language Model(PLM).However,the existing extractive summarization techniques only stay in the application of early PLM models.Early PLM models have poor coding ability.However,when extracting summary,complex sentence semantic information and logical relationships between sentences need to be considered.This paper aims to study how to combine language knowledge in PLM to obtain higherquality sentence representations and improve the accuracy of extractive summarization.The research content is mainly divided into the following three points:·Text summarization algorithm based on the ALBERT:In view of the poor coding ability of the early PLM model and the inconsistent usage of special word representations in pre-training and downstream tasks,this paper proposes a text summarization algorithm based on the ALBERT.The algorithm utilizes the powerful coding ability of the ALBERT to extract the semantic information within the sentence and the logical relationship between sentences.Considering the representation of[CLS]carries less semantic infonnation,a mean-pooling strategy is proposed to extract fulltext feature representations,and a lossless compression summary of the original text information is selected accordingly.·Dynamic fusion summary algorithm based on multi-Hidden layer representation:The output of the PLM model is used as the sentence encoding representation,ignoring the existence of language knowledge in each hidden layer of the PLM.Therefore,this paper uses the longterm memory ability of the Bi-directional Long Short-Term Memory(BiLSTM)network to encode the knowledge scattered in each layer.It integrates the language knowledge of different layers,and ex-tracts the language knowledge that is effective for summary extraction from the text to the greatest extent.At the same time,in order to solve the long time-consuming problem of calculating the importance of the Bi-LSTM model,it is proposed to use the sentence representation instead of the word representation as the input of the Bi-LSTM model.The algorithm can integrate more comprehensive language knowledge with less running time,and extract high-quality summary under the comprehensive interpretation of the article.·Design and implementation of text summarization visualization system:The text summarization system constructed in this subject integrates text clustering and the above-mentioned summarization algorithm to process information,and visualizes the results of the algorithm through the Web system,which greatly improves the reading efficiency of users.
Keywords/Search Tags:Extractive Summarization, Language Knowledge, Pre-trained Language Models
PDF Full Text Request
Related items