Font Size: a A A

Research Of Visual Question Answering With Capsule Network

Posted on:2022-04-12Degree:MasterType:Thesis
Country:ChinaCandidate:J PuFull Text:PDF
GTID:2558307154978139Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,Computer Vision and Natural Language Processing have each flourished and made great strides,which in turn has led to the rise of their crossover field,Visual Question Answering.The core of Visual Question Answering is to find the correlation between questions and images.A qualified Visual Question Answering system should be able to generate correct answers to questions based on the image and the questions associated with the image.To obtain compact visual features,existing Visual Question Answering methods widely employ attention mechanisms that evaluate the importance of different regions of an image based on the question information,which can highlight key information and omit irrelevant information.However,as the difficulty of the task rises,especially the forms of questions become more and more diverse and the information required to answer the question becomes more and more complex,many researchers have proposed more and more complex models,which brings a huge number of parameters and training difficulties,and also makes the models easily fall into overfitting.Considering the above,this paper explores the attention mechanism in the Visual Question Answering model based on Capsule Network,and the main works are as follows:(1)Inspired by Capsule Network,this paper proposes a Visual Question Answering model based on a capsule multi-head attention mechanism.The model uses a routing algorithm similar to that in Capsule Network to improve the traditional multihead attention mechanism,and its parameters are updated in a dynamic iterative manner,which can improve the performance of the multi-head attention mechanism in Visual Question Answering tasks and alleviate the problem that multiple heads are too redundant in the multi-head attention mechanism.The model was experimented on a publicly available dataset to validate its effectiveness.(2)In this paper,we further propose a Visual Question Answering model based on a capsule self-guided co-attention mechanism for multi-layer attention in Visual Question Answering tasks.The model first models the self-attention of images and problems,then draws on the idea of Capsule Network to continuously update their coupling coefficients using a routing algorithm for the set of image features to obtain the weight coefficients between different regions of the images,and finally uses the problem text features to guide the attention weight coefficients of the image features,which together form the co-attention mechanism.To verify the validity of the model,we conducted separate quantitative and qualitative evaluations on publicly available datasets.
Keywords/Search Tags:Computer Vision, Natural Language Processing, Visual Question Answering, Attention mechanism, Capsule Network
PDF Full Text Request
Related items