Font Size: a A A

Research And Application Of Image Retrieval With Text Manipulation

Posted on:2023-08-31Degree:MasterType:Thesis
Country:ChinaCandidate:J H ZhaFull Text:PDF
GTID:2568306779971599Subject:Electronic information
Abstract/Summary:PDF Full Text Request
Image retrieval is a hot research topic in computer vision.Traditional image retrieval systems usually take text or image as input,i.e.,text-based image retrieval and content-based image retrieval.However,pure text or a single image usually cannot accurately express the user’s intention.In many cases,users want to add their own modifications to query a target image based on an existing image,i.e.,use a query image and a text containing modification information to retrieve the target images that meet the conditions.This retrieval mode is called image retrieval with text manipulation,and it is widely used,for example,in the field of clothing retrieval,users can use this retrieval mode to find items that are similar to a given T-shirt image but with different colors or collar designs.The difficulty of this type of retrieval task lies in how to handle the differences between different modal data when performing multimodal feature fusion and how to correlate the semantic information of the text with the visual information of the images.Most approaches use deep metric learning methods to compute the similarity between query and candidate images by fusing global feature of query image and text feature.However,modification text is usually related to local feature of the query image rather than global feature.Therefore,we propose an image retrieval model based on local feature modification,LFM-IR,whose core idea is to correlate the semantic information of text and the visual information of images through the attention mechanism,and then perform the modification of local features of the query image.The LFM-IR model contains four modules,namely,feature extraction module,spatial attention module,channel attention module and feature modification module.And the spatial attention module is used to focus on the text-related image regions,the channel attention module is used to focus on the text-related attributes,and the feature modification module is used to perform specific modification.The contributions of this dissertation can be summarized as follows:(1)An image retrieval model based on local feature modification,LFM-IR,is proposed to handle image retrieval tasks with text manipulation.The model can modify local feature of query image by using text information.Extensive experiments are done on three benchmark datasets to evaluate the performance of existing approaches and our approach.The results show that LFM-IR model performs better.(2)A simple and effective spatial attention module and channel attention module are designed to focus on the image regions and attributes that need to be modified.The accuracy of the spatial attention module is verified by visualization experiments,which enhance the interpretability of the LFM-IR model.Through ablation studies,the effectiveness of the two attention modules is proved.(3)We analyze several factors that affect the LFM-IR model,and optimize the model to further improve its retrieval performance.And the LFM-IR model is applied in reality to develop a clothing retrieval system with clothing as the theme.In addition to supporting the traditional image retrieval mode,it also supports the image retrieval mode with text manipulation,which can flexibly meet the needs of users and help them quickly and accurately find the clothing images that meet expectations from a mass of clothing images.The model proposed in this dissertation provides a new idea for image retrieval with text manipulation,and the model can be directly applied to the clothing retrieval in e-commerce platforms.
Keywords/Search Tags:Image retrieval, multimodal feature fusion, attention mechanism, text manipulation, local feature modification
PDF Full Text Request
Related items