Research On External Knowledge Dynamic Visual Commonsense Reasoning

Posted on:2024-08-29

Degree:Master

Type:Thesis

Country:China

Candidate:J Q Zhang

Full Text:PDF

GTID:2558307127960949

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

In recent years,as an important part of cross modal intelligence research,visual commonsense reasoning task has attracted extensive attention from researchers in multi-modal field.Although the research has achieved many excellent outcome,how to further narrow the semantic gap between the image modality and the text modality,mine reasonable knowledge of the external world and crack the black-box model to achieve explicit reasoning are still important issue that researchers need to solve urgently.To address the above issues,this thesis proposes a visual commonsense reasoning method driven by external knowledge.The main contents of the article are as follows:(1)We propose a visual commonsense reasoning model based on multi-task learning.The model shares the feature extractor parameters of image-text matching module and visual commonsense reasoning module,mines the cross-modal knowledge learned by the image-text matching module as external knowledge,and applies it to the visual commonsense reasoning module.So,we can improve the generalization ability and performance of the model.In the image-text matching module,we propose a feature multiple pooling operation,and the most suitable pooling operation is selected for each local feature to obtain its global feature,thereby achieving alignment between the visual and text modules and narrowing the semantic gap.(2)We propose a visual commonsense reasoning model enhanced by graph reasoning.The model inputs visual,text features and graph node representations into the Transformer encoder,and allows different modal information to interact when updating each layer of features,so that the model can learn structured external knowledge,and bridge the semantic gap between different modalities.When retrieving knowledge sub-graphs,we proposed a node correlation scoring mechanism.Guided by queries and responses,the correlation between nodes is calculated with the help of graph attention network to obtain knowledge sub-graphs to assist model prediction.(3)We design a prototype system for visual commonsense reasoning.The prototype system is realized by using VUE,and Python technologies for mixed programming,and the interface realizes functions such as answer selection and answer verification.

Keywords/Search Tags:

Visual commonsense reasoning, External knowledge, Semantic gap, Cross model

PDF Full Text Request

Related items

1	Research On External Knowledge Integrated Reasoning For Commonsense Question Answering
2	Research And Implementation Of Commonsense Reasoning Technology Based On Knowledge Fusion
3	Research And Implementation Of Commonsense Reasoning Technology Based On Path Mining
4	Research On Key Algorithms Of Visual Question Answering Based On External Knowledge And Semantic Understanding
5	Research On Method Of Acquiring Commonsense Knowledge Based On Semantic Taxonomy
6	Research And Implementation Of Commonsense Reasoning Technologies Based On Multiple Knowledge Fusing
7	Research On Deep Image Captioning Technology With Semantic Guidance
8	Research On Visual Commonsense Reasoning Based On Attention Networks
9	Research On Natural Language Semantic Representation And Reasoning Based On Neural Networks
10	Research And Application Of Elastic Semantic Reasoning For Large-scale Knowledge Graph