Research On Image-aware Story Ending Generation

Posted on:2022-06-07

Degree:Master

Type:Thesis

Country:China

Candidate:C Huang

Full Text:PDF

GTID:2518306536953549

Subject:Control Science and Engineering

Abstract/Summary:

PDF Full Text Request

With the development of related technologies of computer vision and natural language processing,the studies of story generation tasks which take the image or text information as input has become more and more in-depth,but there are few studies for story generation which take the image and text as the input at the same time.This thesis proposes an image-aware story ending generation task,which generates the story endings given the story context and one contextualrelated image.It aims to generate the story endings that not only conform the logic of story plots but also contain the image semantic information.The main challenges of proposed task include: 1)It requires model to understand the story context and the image information effectively.2)It requires model to make a full integration of language and vision information,and construct the explicit and implicit relations of inter-and intra-modality.3)It requires model to be able to select the vision concepts from the image that can match the trend of story plots,and further to mind the high-level semantics from the image for serving the more coherent,semantically rich,and attractive generations.To tackle these challenges of Ia SEG task,this thesis proposes a story ending generation model based on the multiple graph neural network and multiple long short-term memory network.We first parse each sentence of the story to obtain the dependency parsing tree to construct the graph network of sentences.Then the model encodes the single sentence by the single graph neural network and the story context by multiple graph neural network.The model captures the logical relation between the story context.Finally,the model generates the story endings by a multiple long short-term memory network.Specially,we use a cascade-textimage attention mechanism in decoder to fuse the text features and image features,and choose the vision concept related to trend of story stream to introduce image semantics into generated text.This thesis designs a Multiple Graph ATtention Lstm neural network(MGATL)and a Multiple Graph Convolution Network Lstm neural network(MGCNL),respectively.In addition,this thesis uses Seq2 Seq,Transformer,IE-MSA,and T-CVAE as the comparison baselines.We conduct the story ending generation and the image-aware story ending generation experiment.The mass experiments,ablation experiments,cases,and visualization show the model based on multiple graph convolution network can encode the story context effectively.By selecting the important vision concept by the multiple LSTM,the model can generate the story endings which are logically self-consistent,semantically rich,and conforming to the image content.With the help of image information,the more specific and readable story endings can be generated.

Keywords/Search Tags:

Story ending generation, Multimodal, Graph convolutional network, Graph attention network, Attention mechanism

PDF Full Text Request

Related items

1	Multimodal Question Generation Based On Graph Attention Network
2	Research On Story Ending Generation Technology Based On Deep Learning
3	Language Understanding And Generation For Multimodal Human-Computer Interaction
4	Research On Graph Classification Algorithm Based On Graph Convolutional Network
5	Application Of Relation Extraction Based On Attention And Graph Convolutional Network
6	Research On Node Classification Model Based On Multi-level Graph Attention Convolutional Neural Network
7	Research On Entity Disambiguation Method Of Knowledge Graph Based On Deep Learning
8	Visual Question Answering Of Sport Scenes Based On Graph Neural Networks
9	Image Caption Algorithm Based On Graph Convolution Networks And Attention Mechanism
10	Research On Dual-channel Graph Convolutional Network Based On Mixed Features