Font Size: a A A

Research And Implementation Of Visual Question Answering Algorithm Based On Deep Attention Stacking

Posted on:2023-05-04Degree:MasterType:Thesis
Country:ChinaCandidate:Q GaoFull Text:PDF
GTID:2568306833488834Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
Visual question answering belongs to the cross research field of computer vision and natural language processing.It needs to process images and texts with different modal inputs and give a reasonable answer in line with human thinking.Visual question answering is widely used in blind groups,image retrieval,transportation,media entertainment and so on,so it has important research value.In the early research methods of visual question answering,the interaction between question and image is mostly simple.It ignored the intensive interaction between each word and each image area,which is not enough to deeply model the potential relationship between image and question;In addition,most methods ignored the relationship between the same modals.In order to solve these problems,based on the theory of deep learning,this paper proposes two multi-modal visual question answering algorithms based on deep attention stacking,and implements a visual question answering system finally.This paper mainly studies the visual question answering task from the following aspects:(1)Aiming at the problem of insufficient information interaction intra modals and inter modals,a visual question answering algorithm DAS based on deep attention stacking is proposed in this paper.Firstly,the DAS model used the feature extraction module to initially extract the question and image features,and then used the multi-modal interaction module to closely interact the question and image features.Finally,the output classification module is used to predict the answer.A series of comparative experiments on the open VQA v2.0 dataset show that the DAS model can effectively improve the accuracy.(2)Aiming at the problem that the accuracy of DAS model decreases greatly with the increase of iteration times,a multi branch visual question answering algorithm MDAS based on deep attention stacking is proposed in this paper.The MDAS model used three branches stacked by unit models to fully interact with the problem and image features,and obtains the prediction answer through multi-channel output module and multi-channel loss function.A series of comparative experiments on the open VQA v2.0 dataset show that MDAS model can effectively improve the performance,especially the accuracy of Number problems.(3)This paper built a visual question answering system,integrate the proposed MDAS visual question answering algorithm into the system,and demonstrate the function of the system.
Keywords/Search Tags:Visual Question Answering, Multi-modal, Deep Learning, Attention Mechanism
PDF Full Text Request
Related items