Research On Question Answering Systems For Visual Content And Emotion Perception

Posted on:2024-05-24

Degree:Master

Type:Thesis

Country:China

Candidate:J Cai

Full Text:PDF

GTID:2568307157482264

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Visual Question Answering(VQA)is a multimedia understanding task that requires computers to answer natural language questions related to the content of a given image.Since its conception,VQA has attracted considerable attention from researchers.Emotional Visual Question Answering is an extension of VQA that not only answers questions related to visual content but also answers questions related to visual emotions while incorporating emotional information into the answers.The main contributions of this paper are as follows:(1)Early VQA models performed poorly in addressing emotion-related questions,mainly due to the neglect of emotional information in the images and insufficient utilization of key regions in the images and key words in the text,leading to shallow understanding of fine-grained questions and thus affecting the accuracy of the answers.To fully incorporate image emotion information into VQA models and use this emotion information to enhance the model’s ability to answer questions,we propose an Image Emotion Enhanced Multimodal Visual Question Answering model(IEVQA).The IEVQA model consists of two main modules: a semantic module and an emotion module.The semantic module is responsible for processing semantic information in VQA tasks,while the emotion module focuses on analyzing emotional attributes in images.These two modules share the same Transformer encoder to achieve the fusion of semantic and emotional information when processing questions.Experiments on the related VQA benchmark dataset demonstrate the effectiveness and superiority of the IEVQA model.The final experimental results show that the IEVQA model performs better on comprehensive indicators than other comparison methods,and validates the effectiveness of using emotional information to assist VQA models.(2)Current emotional VQA tasks mainly focus on multiple-choice VQA.However,existing emotional VQA models tend to produce less natural answers after incorporating emotional information,and introducing emotional information before obtaining the answer reduces the accuracy of choices.To naturally incorporate image emotion information into multiple-choice VQA answers without compromising model accuracy,we propose a Promptbased Image Emotional Visual Question Answering Method(PIEVQA).PIEVQA designs explicit emotional prompt texts for each image’s emotional information and inputs the correct answer chosen by the VQA model along with the emotional prompt text into the pretrained GPT-3 model to obtain an answer with emotional information.Experiments on the VQA-V2 and Visual7 W datasets verify the effectiveness of PIEVQA.The experimental results show that compared to other VQA models,PIEVQA generates more natural and human-like emotional answers that better describe the emotional information of images while maintaining high accuracy.This provides new insights for emotional expression in the VQA field and paves the way for new application scenarios of prompt-based VQA methods.

Keywords/Search Tags:

Visual Question Answering, Multimedia Understanding, Emotional Visual Question Answering, Prompt-based Learning

PDF Full Text Request

Related items

1	Research Of Visual Question Answering Based On Cross-media Multimodal Representation Learning
2	Research On Visual Question Answering Method With Visual Content Understanding And Text Information Analysis
3	Research On Affective Visual Question Answering
4	Research And Application Of Multi-domain Visual Question Answering System Based On Image Comprehension
5	Research On Key Techniques Of Question Understanding For Open-domain Question Answering System
6	Research On Visual Question Answering Method Based On Deep Learning
7	Research On Visual Question Answering Algorithm Based On Deep Learning
8	Research And Application Of Visual Question And Answering Algorithm Based On Deep Learning
9	Research Of Visual Question Answering Method Based On Deep Learning
10	Research On Visual Question Answering Based On Deep Neural Network