Research On Cross-Modal Natural Language Generation

Posted on:2022-04-24

Degree:Doctor

Type:Dissertation

Country:China

Candidate:S Yang

Full Text:PDF

GTID:1528307169477394

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

As an important research topic in natural language processing(NLP)and artificial intelligence,cross-modal natural language generation aims to convert non-natural language data into natural language.Its input could be a list,graph,or image,which is diverse in modality.Since cross-modal natural language generation lays a foundation for many NLP tasks,it draws lots of attention from academic and industrial areas.At the same time,the neural-based generation method has made a breakthrough with large-scale datasets and learning algorithms.However,such a method can only model sequences,incapable of dealing with data of other modalities.It also has shortcomings in diversity generation and consistency generation.Therefore,this paper aims to propose solutions to the critical issues in cross-modal natural language generation.Its innovations are as follows:(1)Targeting the cross-modal generation of “abstract meaning representation to text”,this paper proposes a distant-context aware generation.Abstract meaning representation is a classical representation method for text semantic,which exploits graphs as the structure.Graph neural network can be used to encode such graphs,but too shallow to capture the context of distant neighbors due to over-smoothing,resulting in inaccurate text.This paper proposes three solutions: a bidirectional Transformer-based graph encoder,node receptive field expansion mechanism,and encoding fusion mechanism.The paper conducts experiments on two datasets,LDC2015E86 and LDC2017T10.Experimental results show that the above methods can significantly improve the BLEU and METEOR scores by encoding distant context and achieves a new state-of-the-art.(2)Targeting the cross-modal generation of “structured data to text”,this paper proposes a dynamic planning based generation model.Existing “structured data-to-text” generation methods usually include two procedures: data planning and text realization.Since planning precedes realization completely,the generation model cannot adjust the data plan when errors occur in the realization,thus lacking adaptability.In addition,deficiency of labeled data planning and absence of evaluation metrics for planning are two remaining issues in “structured data-to-text”.To obtain adaptability and overcome the deficiency of labeled plan,this paper proposes a dynamic planning based generation model and a reinforcement learning based likelihood-driven training strategy; This paper also devises a plan evaluation metric to measure data plan quantitatively.Experimental results on the E2 E and EPW show that the dynamic planning based model could produce both data plan and text of high quality,and the likelihood-driven training strategy can make the model obtain performance close to the one of supervised training.(3)Targeting the cross-modal generation of “image to text”,this paper proposes a new task of visual question-answer pair generation(VQAPG).VQAPG aims to simultaneously generate questions and answers based on images,and its challenge lies in diversity and consistency.This paper also proposes three VQAPG models based on different paradigms and adds latent variables through the variational inference mechanism to achieve diversity.Next,this paper proposes two mechanisms: region representation scaling and attention alignment,both of which are used to improve consistency.Experimental results on VQA2.0 and Visual-7W indicate the effectiveness of the above methods in generating diverse and consistent question-answer pairs.Moreover,VQAPG can be used to improve visual question generation and visual question answering.(4)Targeting the cross-modal generation of “annotated sequence to text”,this paper proposes a pre-training language model based event generation method.“Annotated to text” aims to generate new labeled data based on those existing and alleviate the scarcity of labeled samples in information extraction.However,due to the lack of external knowledge,the generated text is typically monotonous,insufficient to improve the generalization ability of the extraction model.This paper focuses on event extraction and proposes an event generation method for generating annotated corpus based on BERT,a pre-trained language model.This event generation method can generate several times the existing samples.Besides,to solve the problem of argument overlapping in event extraction,this paper also proposes an event extraction model based on BERT.Experimental results on ACE2005 demonstrate that the extraction model can surpass most peer works.Besides,incorporating generated corpus exhibits further significant improvement.The paper obtains new state-of-the-art results on the event extraction task,including pushing the F1 score of trigger classification to 81.1% and the F1 score of argument classification to58.9%.In summary,this paper studies cross-modal natural language generation from four perspectives and proposes solutions for corresponding problems.Specifically,for “abstract meaning representation to text”,this paper proposes a distant context aware model and two mechanisms to capture information of distant neighbors; For “structured data to text” generation,targeting adaptability,this paper proposes a dynamic planning based generation model,a likelihood-driven training strategy for the planner,and plan evaluation metric; For “image to text”,this paper proposes a new task,VQAPG,and three models.For “annotated sequence to text”,targeting the lack of external knowledge,this paper proposes an event generation method based on the pre-trained language model.These methods are of great significance to the development of cross-modal natural language generation.

Keywords/Search Tags:

Cross Modal, Natural Language Generation, Abstract Meaning Representation to Text, Structured Data to Text, Image to Text, Annotated Sequence to Text

PDF Full Text Request

Related items

1	Research On Semantic Analysis And Generation Technology For Text Sequence Data
2	Research On Text Representation Model And Application In Text Classification And Natural Language Inference
3	Research On Text Abstract Generation Method Based On Deep Neural Network
4	Research On Semantic Text Exchange Method Based On Pre-trained BART Language Model
5	Research On Sequence-to-Text Inference And Generation Based On Matching And Transformation
6	Research On Text Steganography
7	Generate Text Description From Content-Based Annotated Image
8	Research On Data-to-text Generation Model Based On Data Relation Mining And MASS Technology
9	Research On Key Technologies Of Text Semantic Matching Based On Structural Features And Multi-layer Information Interaction
10	Researching Text Classification Using Semantic And Sequence Information