Research On Multimodal Learning Algorithm Based On Image And Text Retrieval

Posted on:2020-10-06

Degree:Master

Type:Thesis

Country:China

Candidate:J P Yao

Full Text:PDF

GTID:2428330602952377

Subject:Communication and Information System

Abstract/Summary:

PDF Full Text Request

With the advent of the era of big data,data of different modal types such as text,images,and video have grown rapidly on the Internet.These multimodal data describe the same event from a number of different perspectives,with rich complementary information that makes people's perceptions of events more comprehensive.In order to make better use of multimodal data,people try to model multi-modal data,and various effective multi-modal learning algorithms are proposed,and many popular multi-modal data research emerges,such as cross-modal retrieval.At present,the problem in multimodal deep learning is that multimodal data has noisy labels or missing labels,and the cost of manual verification is high,so there is an urgent need for a method that can effectively utilize noisy labels for multimodal learning;The existing multimodal data correlation mining methods mainly focus on single-level mining,which can only capture part of the hierarchical association,which requires a more comprehensive multi-level correlation mining algorithm to capture complex correlation between multimodal data.Therefore,this paper focuses on how to use multimodal dataset noisy labels for efficient multimodal learning,and multi-level deep mining data correlation and apply it to cross-modal retrieval systems.Firstly,this paper proposes a noisy label cleaning and prediction method for the noisy label problem of multi-modal data sets.The network includes image embedding sub-network,text embedding sub-network,fusion layer,non-linear layer,etc.,using weakly supervised method,using part of verified labels in multi-modal data sets to learn the mapping from multimodal content features to label semantic space,then used to clean and predict noisy data labels.In order to verify the effectiveness of the proposed network,a classification network based on multimodal data is designed.The classification result is used to judge the processing effect of the noisy label.The experiment proves that the classification accuracy of the proposed method can be increased by nearly 3.5% compared with the existing methods finetune with verified labels.Next,this paper proposes a multi-level correlation mining method MLCM based on multi-modal data and applies it to the cross-modal retrieval system.By constructing a multilevel correlation learning network,correlation mining is performed between different feature layers of different modal data,which makes up for the shortcomings of correlation learning in only a low-level feature space or a high-level semantic feature space.In addition,in the cross-modal retrieval system,we also use the network proposed above to use the label information to train the model,thus fully exploiting the inter-modality correlation and intramodality correlation.The experimental results show that compared with the MCNN algorithms,the proposed method effectively improves the accuracy of data retrieval.On the Flickr30 k datasets,the R@10 for image retrieval is increased by 1.2%,the R@10 for sentence retrieval is increased by 2.6%.The multimodal data label cleaning network and multi-level correlation mining algorithm in this thesis can be widely used in noisy label processing and cross-modal retrieval systems.

Keywords/Search Tags:

Multimodal Deep Learning, Noisy Label Processing, Multi-level Correlation Mining, Cross-modal Correlation

PDF Full Text Request

Related items

1	Image Multi-label Learning Based On Correlation Residual Network Tree Model
2	Research On Multimodal Data Correlation Analysis Based On Kernel Learning
3	Learning Label Correlation For Multi-label Image Recognition
4	Research On The Multi-label Lassification Methods With The Label Embedding And Structure Information
5	Cross-modal Retrieval Research Based On Correlation Analysis And Structure Preserving
6	Multimodal Data Analysis And Applications Based On Multimodal Resonance And Co-Occurrence
7	Label-specific Feature Multi-label Learning Based On The Combination Of Multiple Correlation Information
8	A Cross-Modal Multimedia Retrieval Method Research Based On Deep Learning And Centered Correlation
9	Research On Multi-marker Learning Based On Marker Correlation
10	Research On Acquisition And Application Of Label Correlation In Multi-label Learning