Research On Micro-videos Deep Multimodal Association Representation Learning And Its Applications

Posted on:2022-04-13

Degree:Master

Type:Thesis

Country:China

Candidate:Y X Li

Full Text:PDF

GTID:2558307154476104

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

In recent years,with the popularization of smart phones and mobile Internet,micro-videos have been developed rapidly as a new form of user generated contents(UGCs).Browsing micro-videos has become one of the most popular entertainment methods.Micro-videos naturally have modal and semantic associations.How to make full use of these associations is the key of micro-videos representation learning.In addition,how to effectively characterize and encode large-scale micro-videos to perform intelligent analysis is of practical significance for both micro-video platforms and users.Therefore,from the perspective of multimodal representation learning,this thesis conducts the following research work.Aiming at capturing associations of multimodal features and semantic labels of micro-videos,this thesis proposes a micro-video multi-label classification method based on bi-directional encoding networks.This algorithm integrates modal fusion in sample space and semantic association learning in label space into a unified framework.Specifically,in sample space,intra-modal and inter-modal encoding networks are used to remove noise and redundant information,and the common and complete representations of multimodal fusion are obtained.In semantic label space,the graph convolutional neural network is used to learn the representations of semantic labels,which are used to guide multi-label classification task.The reconstruction loss and multi-label classification loss are formulated as a whole.The experiments on the largescale micro-video database verify the effectiveness of the proposed method in the multilabel classification task.Aiming at mining the discriminative hashing code representations,this thesis proposes a deep hashing method based on multi-modal subspace representation learning.The method firstly obtains the discriminative intrinsic representations of the sample in the real-valued space,then the corresponding hashing codes are obtained by mapping the discriminative representations into binary space.Specifically,in sample space,subspace representation learning is used to make full use of multimodal consistency and complementarity information to obtain a unified feature representation.In semantic label space,association representations are obtained to improve the discriminative ability of feature representations.On this basis,by introducing hash learning item,the discriminative representations in the real-valued space are better used to guide the process of hash encoding.The experiments conducted on micro-video retrieval task verify the effectiveness of the proposed method.

Keywords/Search Tags:

Multimodal Fusion, Multi-label Classification, Hash Learning, Micro-videos

PDF Full Text Request

Related items

1	Research On Deep Multi-modal Enhanced Representation Learning And Its Applications For Micro-videos
2	Research And Implementation Of Multimodal Micro-video Classification Method
3	Research On Multi-label Classification Algorithms Based On Samples And Property Analysis
4	Research On Videos Quality Classification Algorithm Based On Deep Learning
5	Multi-label Classification Of Captioned Images Based On Deep Learning
6	Research On Multi-label Classification Algorithm Based On Label Relationship
7	Research On Multi-label Chinese Webpage Classification Models Based On Multi-information Fusion Deep Learning
8	Research On Micro-video Multi-label Classification Based On Deep Matrix Factorization
9	Multimodal Hand Feature Fusion Recognition
10	Research On The Multi-label Lassification Methods With The Label Embedding And Structure Information