Font Size: a A A

Research On Image-aware Story Ending Generation

Posted on:2022-06-07Degree:MasterType:Thesis
Country:ChinaCandidate:C HuangFull Text:PDF
GTID:2518306536953549Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
With the development of related technologies of computer vision and natural language processing,the studies of story generation tasks which take the image or text information as input has become more and more in-depth,but there are few studies for story generation which take the image and text as the input at the same time.This thesis proposes an image-aware story ending generation task,which generates the story endings given the story context and one contextualrelated image.It aims to generate the story endings that not only conform the logic of story plots but also contain the image semantic information.The main challenges of proposed task include: 1)It requires model to understand the story context and the image information effectively.2)It requires model to make a full integration of language and vision information,and construct the explicit and implicit relations of inter-and intra-modality.3)It requires model to be able to select the vision concepts from the image that can match the trend of story plots,and further to mind the high-level semantics from the image for serving the more coherent,semantically rich,and attractive generations.To tackle these challenges of Ia SEG task,this thesis proposes a story ending generation model based on the multiple graph neural network and multiple long short-term memory network.We first parse each sentence of the story to obtain the dependency parsing tree to construct the graph network of sentences.Then the model encodes the single sentence by the single graph neural network and the story context by multiple graph neural network.The model captures the logical relation between the story context.Finally,the model generates the story endings by a multiple long short-term memory network.Specially,we use a cascade-textimage attention mechanism in decoder to fuse the text features and image features,and choose the vision concept related to trend of story stream to introduce image semantics into generated text.This thesis designs a Multiple Graph ATtention Lstm neural network(MGATL)and a Multiple Graph Convolution Network Lstm neural network(MGCNL),respectively.In addition,this thesis uses Seq2 Seq,Transformer,IE-MSA,and T-CVAE as the comparison baselines.We conduct the story ending generation and the image-aware story ending generation experiment.The mass experiments,ablation experiments,cases,and visualization show the model based on multiple graph convolution network can encode the story context effectively.By selecting the important vision concept by the multiple LSTM,the model can generate the story endings which are logically self-consistent,semantically rich,and conforming to the image content.With the help of image information,the more specific and readable story endings can be generated.
Keywords/Search Tags:Story ending generation, Multimodal, Graph convolutional network, Graph attention network, Attention mechanism
PDF Full Text Request
Related items