Research On Semantic Text Exchange Method Based On Pre-trained BART Language Model

Posted on:2024-05-19

Degree:Master

Type:Thesis

Country:China

Candidate:J Y Miao

Full Text:PDF

GTID:2568307133991649

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Semantic text exchange is the task of changing text content while preserving text style.This task can be used for a wide range of natural language processing tasks,including data enhancement,adversarial attacks on text,and conversational systems.These fields have gradually become the focus of natural language processing research in recent years.However,existing semantic text exchange methods have the problem of limited fluency of the generated text,and the semantic meaning of the final generated sentence is limited to candidate words or phrases,which makes the generated text not rich enough,and the sequence-to-sequence model needs to be trained on the data set specially used to fill the masked sentence,which increases the cost and time of the whole process.If the quality of the original data set is not good enough,measures such as the fluency of the resulting sentence will also suffer.In addition,there may be cases where there is no entity in the original sentence that is semantically similar to the replacement entity,resulting in an unreasonable match between the generated part and the unmodified part of the original sentence.Because of these shortcomings,the existing semantic text exchange methods are not very good at generating text.In order to solve the above problems,this paper combines the pre-trained BART language model and the word replacement method based on the vocabulary database Word Net to develop a new semantic text exchange method,and puts forward two sentence generation methods that are semantically independent and retain other sentence information,so as to generate smooth and diverse semantically independent sentences under the premise of reducing the cost of sentence generation.We combine a three-word substitution pattern with a pre-trained text fill model to generate text.The main research content and innovative work of this paper are as follows:(1)The basic principle of the semantic text exchange method based on training on specific data sets is briefly analyzed.Firstly,this paper briefly analyzes the current typical semantic text exchange methods based on training on specific data sets,including their method structure and workflow,summarizes the existing defects of semantic text exchange methods,and proposes effective solutions to these defects.The method introduced here is a reference for the experimental results of the subsequent proposed methods.(2)Propose a semantic text exchange method based on pre-trained BART language models.Aiming at the problems of high cost of model training and insufficient fluency of generated text in existing semantic text exchange methods,we mainly proposed a semantic text exchange method based on pre-trained BART language models for the text filling module.Experimental results on Amazon Reviews,Yelp Reviews,and the News Category Dataset show that our proposed method can effectively improve the fluency of text generation.In addition,because we use the pre-trained BART language model to carry out the text filling task in the semantic text exchange method,we avoid additional training of the text filling model on the target data set,thus saving the cost of model training.(3)Propose a semantic text exchange method based on word replacement and pretrained BART language models.In view of the limited semantic range of sentences generated by the existing semantic text exchange methods,where the generated text is not rich enough,the fluency of the generated sentences is insufficient,and the model training cost is high,we propose a semantic text exchange method based on the lexical database Word Net word replacement combined with pretraining BART language models.The experimental results on Amazon Reviews,Yelp Reviews,and the News Category Dataset showed that the sentences generated by our method were smoother and more diverse,and there was no need for a training model to fill text in specific data sets,which greatly reduced the cost of text generation.In addition,our method has a good performance in terms of the emotional retention of the original sentence and the semantic similarity between the replacement word and the generated sentence.

Keywords/Search Tags:

Natural language processing, Natural language generation, Controlled text generation, Text infilling, Semantic text exchange

PDF Full Text Request

Related items

1	Research On Cross-Modal Natural Language Generation
2	Automatic Generation Of Agricultural Product Advertising Copy Based On Controlled Text Generation Technology
3	Research And Application On Controllable Text Generation Based On Pre-trained Language Models
4	Research On The Construction And Anal Sis Of Common Sense Corpora For Natural Language Generation
5	Research And Application Of End-to-end Chinese Text Generation
6	Research On Key Technologies Of Text Generation In Social Media
7	Automatic Summarization System Based On Natural Language Processing
8	Research On Text Representation Model And Application In Text Classification And Natural Language Inference
9	Research On Natural Language Generation Technology For Electronic Commerce
10	Research On Sequence-to-Text Inference And Generation Based On Matching And Transformation