Font Size: a A A

Research On External Knowledge Dynamic Visual Commonsense Reasoning

Posted on:2024-08-29Degree:MasterType:Thesis
Country:ChinaCandidate:J Q ZhangFull Text:PDF
GTID:2558307127960949Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,as an important part of cross modal intelligence research,visual commonsense reasoning task has attracted extensive attention from researchers in multi-modal field.Although the research has achieved many excellent outcome,how to further narrow the semantic gap between the image modality and the text modality,mine reasonable knowledge of the external world and crack the black-box model to achieve explicit reasoning are still important issue that researchers need to solve urgently.To address the above issues,this thesis proposes a visual commonsense reasoning method driven by external knowledge.The main contents of the article are as follows:(1)We propose a visual commonsense reasoning model based on multi-task learning.The model shares the feature extractor parameters of image-text matching module and visual commonsense reasoning module,mines the cross-modal knowledge learned by the image-text matching module as external knowledge,and applies it to the visual commonsense reasoning module.So,we can improve the generalization ability and performance of the model.In the image-text matching module,we propose a feature multiple pooling operation,and the most suitable pooling operation is selected for each local feature to obtain its global feature,thereby achieving alignment between the visual and text modules and narrowing the semantic gap.(2)We propose a visual commonsense reasoning model enhanced by graph reasoning.The model inputs visual,text features and graph node representations into the Transformer encoder,and allows different modal information to interact when updating each layer of features,so that the model can learn structured external knowledge,and bridge the semantic gap between different modalities.When retrieving knowledge sub-graphs,we proposed a node correlation scoring mechanism.Guided by queries and responses,the correlation between nodes is calculated with the help of graph attention network to obtain knowledge sub-graphs to assist model prediction.(3)We design a prototype system for visual commonsense reasoning.The prototype system is realized by using VUE,and Python technologies for mixed programming,and the interface realizes functions such as answer selection and answer verification.
Keywords/Search Tags:Visual commonsense reasoning, External knowledge, Semantic gap, Cross model
PDF Full Text Request
Related items