Research And Implementation Of Text Summarization System Based On Pre-Trained Language Model

Posted on:2023-11-24

Degree:Master

Type:Thesis

Country:China

Candidate:W Y Guo

Full Text:PDF

GTID:2558306914963499

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

The popularity of mobile terminals has changed the main way for humans to read and obtain information.But nowadays the pace of life is accelerating,and people can only use the fragmented time to quickly browse articles and obtain useful information.Text summarization can help readers filter secondary information and obtain the required information efficiently.At present,this technology has made a major breakthrough under the blessing of the powerful language representation capability of the Pre-trained Language Model(PLM).However,the existing extractive summarization techniques only stay in the application of early PLM models.Early PLM models have poor coding ability.However,when extracting summary,complex sentence semantic information and logical relationships between sentences need to be considered.This paper aims to study how to combine language knowledge in PLM to obtain higherquality sentence representations and improve the accuracy of extractive summarization.The research content is mainly divided into the following three points:·Text summarization algorithm based on the ALBERT:In view of the poor coding ability of the early PLM model and the inconsistent usage of special word representations in pre-training and downstream tasks,this paper proposes a text summarization algorithm based on the ALBERT.The algorithm utilizes the powerful coding ability of the ALBERT to extract the semantic information within the sentence and the logical relationship between sentences.Considering the representation of[CLS]carries less semantic infonnation,a mean-pooling strategy is proposed to extract fulltext feature representations,and a lossless compression summary of the original text information is selected accordingly.·Dynamic fusion summary algorithm based on multi-Hidden layer representation:The output of the PLM model is used as the sentence encoding representation,ignoring the existence of language knowledge in each hidden layer of the PLM.Therefore,this paper uses the longterm memory ability of the Bi-directional Long Short-Term Memory(BiLSTM)network to encode the knowledge scattered in each layer.It integrates the language knowledge of different layers,and ex-tracts the language knowledge that is effective for summary extraction from the text to the greatest extent.At the same time,in order to solve the long time-consuming problem of calculating the importance of the Bi-LSTM model,it is proposed to use the sentence representation instead of the word representation as the input of the Bi-LSTM model.The algorithm can integrate more comprehensive language knowledge with less running time,and extract high-quality summary under the comprehensive interpretation of the article.·Design and implementation of text summarization visualization system:The text summarization system constructed in this subject integrates text clustering and the above-mentioned summarization algorithm to process information,and visualizes the results of the algorithm through the Web system,which greatly improves the reading efficiency of users.

Keywords/Search Tags:

Extractive Summarization, Language Knowledge, Pre-trained Language Models

PDF Full Text Request

Related items

1	Research On Dialogue Summarization Technology Based On Pre-Trained Language Models
2	Research On Compression Techniques Of Pre-trained Language Models
3	Research On Knowledge-Driven Pre-Trained Language Models
4	Research On Knowledge-Enhanced Pre-trained Language Models
5	Algorithmic Studies On Knowledge Enhanced Pre-trained Language Models
6	Research On News Text Summarization Algorithm Based On Pre-trained Language Model
7	Research On Abstractive Text Summarization Based On Pre-trained Language Model
8	Research Based On Pre-trained Language Models And Knowledge Enhancement For Aspect-based Sentiment Analysis
9	Improvement And Compression Of Pre-Trained Language Models For User-Generated Texts
10	Research On Extractive Summarization Methods For Cambodian Language Multi-documents