Cross-modal Retrieval Research Based On Reinforcement Learning

Posted on:2024-01-02

Degree:Master

Type:Thesis

Country:China

Candidate:H Yang

Full Text:PDF

GTID:2568307079976549

Subject:Electronic information

Abstract/Summary:

PDF Full Text Request

Cross-modal retrieval refers to the process of information retrieval among different media types such as image,audio and text.Cross-modal retrieval has become one of the research hotspots in the fields of computer vision,natural language processing and machine learning in the past few years.In recent years,image-text based cross-modal retrieval has made great progress and has attracted more and more attention.The core task of cross-modal retrieval is to accurately measure the similarity between multimodal data.In the interactive cross-modal image retrieval scenario,the heterogeneity of data among multiple modalities and the unbalanced distribution of data among modalities bring many challenges to model construction.First of all,the traditional interaction method is to passively receive user feedback,and then iteratively supplement incomplete information,which will lead to a large amount of user feedback,consume too much energy of the user,and cause the model retrieval to take too long.Secondly,when the user only describes some local areas in the image,the matching of the retrieval results usually fails due to the incomplete information provided.Thirdly,due to the difficulty in obtaining human-computer dialogue data,fully supervised training is unrealistic and human-annotated dialogue is required.Finally,due to the mixed noise in the multimedia data,it also has a certain interference effect on the robustness of the metric learning algorithm.Therefore,how to improve the robustness of metric learning methods is also a big problem.In view of the above problems,this paper proposes the main innovative methods to solve these problems as follows:First,this paper proposes a novel interactive cross-modal retrieval framework with human-computer interaction via inquiry/confirmation.In addition,because it is difficult to obtain comprehensive human-computer dialogue data,a fully supervised training method is unrealistic.Therefore,this paper adopts a weakly supervised training method,which only needs to obtain a data set of image text.This not only reduces the workload of data processing,but also saves a lot of time.Second,this paper proposes a reinforcement learning strategy that enables the model to actively search for clearly distinguishable objects,find out the missing discriminative details in the current query information,and supplement the missing discriminative information,instead of passively receiving these from user feedback.information,which greatly improves the retrieval performance of the model,and is more practical than other dialog-based retrieval models.Third,on the basis of the interactive cross-modal retrieval framework,an in-depth study of metric learning techniques is carried out,and a maximum polynomial loss function is designed to provide a robust metric loss function for cross-modal retrieval tasks.Experimental results show that,This loss function significantly improves the convergence rate and retrieval efficiency of the cross-modal retrieval model.

Keywords/Search Tags:

Interactive cross-modal retrieval framework, Reinforcement Learning, Max Polynomial Loss

PDF Full Text Request

Related items

1	Deep Metric Learning For Cross-Modal Retrieval
2	Research Of Cross-modal Retrieval Methods Based On Deep Learning
3	Semantic Alignment-based Robust Cross-modal Retrieval
4	The Study Of Cross-modal Video Moment Retrieval
5	Research On Deep Cross-modal Retrieval Algorithm Based On Representation Learning
6	Research On Cross-Modal Retrieval Based On Deep Semantic Analysis
7	Cross-modal Video Retrieval Algorithm Based On Multi-semantic Clues And Metric Learning
8	Research On Social-Sensed Cross-Modal Retrieval
9	Cross-modal Retrieval Research Based On Correlation Analysis And Structure Preserving
10	Research And Application Of Cross Modal Image And Text Retrieval Based On Deep Learning