Syntactically Controlled Paraphrase Generation

Posted on:2022-11-28

Degree:Master

Type:Thesis

Country:China

Candidate:J Y Hou

Full Text:PDF

GTID:2518306752454404

Subject:Master of Engineering

Abstract/Summary:

PDF Full Text Request

The text retelling task has important research value in the field of natural language processing and has been applied to downstream tasks such as question-and-answer dialogues,machine translation,and adversarial sample generation.With the success of deep learning in the field of text generation,text repetition has gradually migrated from rulebased template conditional matching to deep learning-based repetition generation tasks.The task of paraphrase generation can be analogous to translation tasks,except that paraphrase generation is a mapping of semantic invariants into the same language space,so the earliest research applied the sequence model of machine translation to the task of paraphrase generation and made good progress.However,in practical application scenarios,the paraphrased sentences are usually required to have certain representational variability from the original sentences,and the paraphrased sentences generated based on the translation model are usually very similar to the original sentences.In order to improve the expression variability and diversity of paraphrase generated texts,it is necessary to study grammatically controllable paraphrase generation models and select diverse grammatical structures to guide the models in generating paraphrase sentences.To address this issue,this paper investigates a grammar-controlled paraphrase generation scheme based on supervised learning,compares existing models and proposes improvements,and verifies the effectiveness of the model through various evaluation metrics of subjective and objective evaluation.In addition,in order to adapt to various linguistic scenarios and avoid the problem of insufficient parallel corpus between original sentences and repetitions,this paper develops a grammatically controlled repetition generation model for unsupervised learning to achieve more generalized repetition generation.Overall,the core contributions of this paper are mainly in three aspects.(1)Applying grammatically controllable deep learning-based paraphrase models to the Chinese language domain.The current research on Chinese retelling generation mainly focuses on using rules or templates,while the research on applying deep learning to retelling generation mainly focuses on English and Japanese,and there is a lack of retelling generation models based on Chinese.In this paper,we study retell generation models in Chinese and English scenarios,and construct features and train models according to word and character granularity for these two languages.(2)Unsupervised learning models are investigated to improve the generalization ability of the models.Supervised learning models rely on parallel corpus for training generation,which cannot achieve effective paraphrase generation in large-scale language scenarios,and thus the models are less portable in different language scenarios.In order to investigate the diversity of retelling generation schemes,this paper optimizes and improves the unsupervised learning grammatically controlled retelling generation model based on existing models,and verifies the generalization ability of the model in multiple language scenarios.(3)A grammar structure-based Chinese-English complete dataset is constructed.In order to adapt to the multilingual scenarios,we collected datasets of news consultation,movie review,novel and other multilingual scenarios,and used Stanford parser to obtain the corresponding serialized grammar tree structures of sentences,constructed parallel corpora based on Chinese and English in various scenarios.

Keywords/Search Tags:

Paraphrase, Controllable text generation, Supervise learning, unsupervised learning, Syntatic parse extraction

PDF Full Text Request

Related items

1	Research On Controllable Paraphrase Generation
2	Research On Paraphrase Processing Methods Based On Neural Networks
3	Paraphrase Generation Based On Keyword Information Retention
4	A Study On Text Generation Technology Based On Deep Learning
5	Research On Fine-grained Chinese Paraphrase Extraction Technology Based On Deep Learning
6	Research On Statistical Paraphrase Acquisition And Generation
7	The Study On Paraphrase Generation Based On Neural Network
8	Research On Auxiliary Annotation Method Based On Attribute Controllable Text Representation Generation
9	Research On Paraphrase Identification Based On Deep Learning
10	Research On Deep Learning Based Chinese Scene Text Detection And Recognition