Font Size: a A A

Multimodal Question Generation Based On Graph Attention Network

Posted on:2022-10-24Degree:MasterType:Thesis
Country:ChinaCandidate:M Y FuFull Text:PDF
GTID:2518306536453234Subject:Control Engineering
Abstract/Summary:PDF Full Text Request
With the development of deep learning technology,the shallow research of artificial intelligence has gradually matured,and it has begun to move towards high-level application research involving language understanding,image understanding and reasoning,such as question generation,machine translation,image description,etc.Among them,question generation is one of the most research valuable and challenging tasks.It needs to generate natural language questions that are free of grammatical errors and can be answered on the basis of understanding text or image content,and this process often involves reasoning.At present,the research on question generation can be divided into two categories,one is question generation based on plain text,and the other is question generation based on vision.Although there has been a lot of research on the former and some results have been achieved,it does not make full use of the structural information and sequence information of the context,which limits the effect of the model,and the generated questions are insufficient in terms of answerability.As for the latter,it belongs to the preliminary exploration stage.The work of the predecessors mainly focused on the low-level information of the image,while ignoring the high-level event information expressed by the image.At the same time,the generated questions are insufficient in answerability.Text-based question generation is the basis of multimodal question generation,so this thesis first develops the research on the question generation of plain text,and then the research on the question generation of vision.To address the problems in the text question generation,this thesis proposes an Entity Guided question generation with contextual structure and sequence Information Capturing model(EGIC),which can capture both structural and sequence information of the context and improve the answerability of the generated questions.The model includes a graph attention encoder,a multifeature encoder,an answer-question type encoder,a feature Fusion module and a decoder.Among them,the graph attention encoder is used to capture contextual structure information,the multi-feature encoder is used to capture contextual sequence information.The answer-question type encoder is used to encode answer entities and question types to guide the generation of interrogative.The feature fusion module is used to fuse structural information and sequence information,the decoder is used to generate the final question.Through the comparative experiment,ablation experiment and case analysis on SQu AD dataset,the results show that,compared with the state-of-the-art model,the EGIC model achieves the best results in the text question generation,which proves the effectiveness of EGIC model.To address the problems in the visual question generation,this thesis proposes a Question Type Driven Dual-channel visual question generation(QTDD)model,which includes a graph attention encoder,an answer-question type encoder and a decoder.Among them,the graph attention encoder is used to encode instance-level scene graphs to obtain event representation based on the scene graph.The answer-question type encoder is used to encode answer entities and question types to guide interrogative generation,thereby improving the answerability of the generated questions.The decoder is used to generate question words.Through the comparative experiment,ablation experiment and case analysis on VQA2.0 dataset,the results show that,compared with baseline models,the QTDD model achieves the comparable results in the visual question generation,which proves the effectiveness of QTDD in the generation of vision questions.
Keywords/Search Tags:Question generation, Graph attention, Scene graph, Multimodal
PDF Full Text Request
Related items