Font Size: a A A

Research On Micro-Video Popularity Prediction Based On Deep Multimodal Representation Learning

Posted on:2022-12-08Degree:MasterType:Thesis
Country:ChinaCandidate:X Q YeFull Text:PDF
GTID:2558307154976819Subject:Electronic information
Abstract/Summary:PDF Full Text Request
Recently,with the rapid development of the mobile Internet and the popularization of smart phones,micro-video data is increasing rapidly,but only a small part of microvideos can become popular due to a large number of watching,likes,comments and reposts,and most of them will be forgotten quickly.Micro-video data has multimodal characteristics and rich semantic information,meaning that the correlation and independence properties are embedded in micro-videos.Therefore,it is necessary to learn effective feature representations to benefit semantic understanding of microvideos.Based on the innovation of short video multi-modal feature representation and prediction,this article has completed the following research works for micro-video popularity prediction.This thesis proposed a deep multimodal fusion based on micro-video popularity prediction method.The method is to uses the self-attention mechanism network to explore the correlations between various modalities to solve the problems of unbalanced dimensions and data missing.The batch normalization layer in the proposed method is used for deep channel exchange.Specifically,the process of channel exchange is to guide the dynamic exchange of modal information between different modal sub-networks through the batch normalization scale factor.The fusion of various modalities is to obtain a unified feature representation to better represent micro-video.The effectiveness of the model is proved by the experimental results on the public dataset.This thesis proposed a micro-video popularity prediction model based on bidirectional deep encoding network,which considers both multimodal fusion and unimodal supervision modeling,and integrates them into a bidirectional deep encoding network.The multimodal fusion module uses modal relevance to solve the problems of dimensionality difference and missing data to obtain a better feature representation;The unimodal supervision module uses modal differences to supervise the fusion of multimodal features.Joint training of multi-modal fusion and unimodal supervision tasks,fully learning the consistency and difference of multimodal information to improve the generalization ability of the algorithm.Extensive experiments have demonstrated the effectiveness of the proposed model.
Keywords/Search Tags:Modal relevance, Multi-modal fusion, Feature representation, Micro-video, Popularity prediction
PDF Full Text Request
Related items