Font Size: a A A

Research On Micro-videos Deep Multimodal Association Representation Learning And Its Applications

Posted on:2022-04-13Degree:MasterType:Thesis
Country:ChinaCandidate:Y X LiFull Text:PDF
GTID:2558307154476104Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
In recent years,with the popularization of smart phones and mobile Internet,micro-videos have been developed rapidly as a new form of user generated contents(UGCs).Browsing micro-videos has become one of the most popular entertainment methods.Micro-videos naturally have modal and semantic associations.How to make full use of these associations is the key of micro-videos representation learning.In addition,how to effectively characterize and encode large-scale micro-videos to perform intelligent analysis is of practical significance for both micro-video platforms and users.Therefore,from the perspective of multimodal representation learning,this thesis conducts the following research work.Aiming at capturing associations of multimodal features and semantic labels of micro-videos,this thesis proposes a micro-video multi-label classification method based on bi-directional encoding networks.This algorithm integrates modal fusion in sample space and semantic association learning in label space into a unified framework.Specifically,in sample space,intra-modal and inter-modal encoding networks are used to remove noise and redundant information,and the common and complete representations of multimodal fusion are obtained.In semantic label space,the graph convolutional neural network is used to learn the representations of semantic labels,which are used to guide multi-label classification task.The reconstruction loss and multi-label classification loss are formulated as a whole.The experiments on the largescale micro-video database verify the effectiveness of the proposed method in the multilabel classification task.Aiming at mining the discriminative hashing code representations,this thesis proposes a deep hashing method based on multi-modal subspace representation learning.The method firstly obtains the discriminative intrinsic representations of the sample in the real-valued space,then the corresponding hashing codes are obtained by mapping the discriminative representations into binary space.Specifically,in sample space,subspace representation learning is used to make full use of multimodal consistency and complementarity information to obtain a unified feature representation.In semantic label space,association representations are obtained to improve the discriminative ability of feature representations.On this basis,by introducing hash learning item,the discriminative representations in the real-valued space are better used to guide the process of hash encoding.The experiments conducted on micro-video retrieval task verify the effectiveness of the proposed method.
Keywords/Search Tags:Multimodal Fusion, Multi-label Classification, Hash Learning, Micro-videos
PDF Full Text Request
Related items