Font Size: a A A

Research On Multimodal Learning Algorithm Based On Image And Text Retrieval

Posted on:2020-10-06Degree:MasterType:Thesis
Country:ChinaCandidate:J P YaoFull Text:PDF
GTID:2428330602952377Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
With the advent of the era of big data,data of different modal types such as text,images,and video have grown rapidly on the Internet.These multimodal data describe the same event from a number of different perspectives,with rich complementary information that makes people's perceptions of events more comprehensive.In order to make better use of multimodal data,people try to model multi-modal data,and various effective multi-modal learning algorithms are proposed,and many popular multi-modal data research emerges,such as cross-modal retrieval.At present,the problem in multimodal deep learning is that multimodal data has noisy labels or missing labels,and the cost of manual verification is high,so there is an urgent need for a method that can effectively utilize noisy labels for multimodal learning;The existing multimodal data correlation mining methods mainly focus on single-level mining,which can only capture part of the hierarchical association,which requires a more comprehensive multi-level correlation mining algorithm to capture complex correlation between multimodal data.Therefore,this paper focuses on how to use multimodal dataset noisy labels for efficient multimodal learning,and multi-level deep mining data correlation and apply it to cross-modal retrieval systems.Firstly,this paper proposes a noisy label cleaning and prediction method for the noisy label problem of multi-modal data sets.The network includes image embedding sub-network,text embedding sub-network,fusion layer,non-linear layer,etc.,using weakly supervised method,using part of verified labels in multi-modal data sets to learn the mapping from multimodal content features to label semantic space,then used to clean and predict noisy data labels.In order to verify the effectiveness of the proposed network,a classification network based on multimodal data is designed.The classification result is used to judge the processing effect of the noisy label.The experiment proves that the classification accuracy of the proposed method can be increased by nearly 3.5% compared with the existing methods finetune with verified labels.Next,this paper proposes a multi-level correlation mining method MLCM based on multi-modal data and applies it to the cross-modal retrieval system.By constructing a multilevel correlation learning network,correlation mining is performed between different feature layers of different modal data,which makes up for the shortcomings of correlation learning in only a low-level feature space or a high-level semantic feature space.In addition,in the cross-modal retrieval system,we also use the network proposed above to use the label information to train the model,thus fully exploiting the inter-modality correlation and intramodality correlation.The experimental results show that compared with the MCNN algorithms,the proposed method effectively improves the accuracy of data retrieval.On the Flickr30 k datasets,the R@10 for image retrieval is increased by 1.2%,the R@10 for sentence retrieval is increased by 2.6%.The multimodal data label cleaning network and multi-level correlation mining algorithm in this thesis can be widely used in noisy label processing and cross-modal retrieval systems.
Keywords/Search Tags:Multimodal Deep Learning, Noisy Label Processing, Multi-level Correlation Mining, Cross-modal Correlation
PDF Full Text Request
Related items