Font Size: a A A

Research On Dynamic Multimodal Information Fusion For Personalized Micro-Video Recommendation

Posted on:2024-06-08Degree:MasterType:Thesis
Country:ChinaCandidate:C F TianFull Text:PDF
GTID:2568307076974739Subject:Master of Electronic Information (Professional Degree)
Abstract/Summary:PDF Full Text Request
In recent years,the popularity of mobile portable devices and the rapid development of micro-video platforms have led to an explosive growth in the number of micro-videos,which has triggered extensive research interest in personalized micro-video recommendation.However,the current micro-video recommendation methods have the following main problems :(1)Most personalized video recommendation methods mainly rely on single-modal data to achieve recommendation,and less use multi-modal data for personalized recommendation.However,micro-videos contain multi-modal heterogeneous information,describing the same content from different perspectives.Users ’ preferences for a micro-video may also be reflected in different modal information,so it is necessary to combine multimodal information to achieve personalized recommendation.However,the current recommendation methods based on single modal information are often difficult to be directly applied to multimodal micro-video recommendation.(2)Although some existing works consider using multimodal information to represent video and users,they are limited to directly concatenate the features extracted from different modalities.In the later modal fusion,the importance of distinguishing different modalities to represent each video and user is not considered.That is to say,it does not take into account that the importance of the three modal representations for the video is different.For the user,the user ’s preference for different modes is also different,and the influence of different modes on the user ’s representation is also different.In order to solve the above problems,this thesis proposes a novel multi-modal recommendation research method,which is different from the traditional method.Traditional methods usually embed and fuse low-dimensional multimodal features and item IDs through splicing operations,or use attention mechanisms to capture user preferences for items.In this thesis,coarse-grained and fine-grained modeling methods are proposed from two aspects : user preference for different modal information and dynamic multi-modal information fusion.Specifically,this thesis studies the user ’s preference for different modal information,and strengthens the influence of preference-aware modal representation learning on video feature embedding through coarse / fine-grained modeling,so as to realize the latent salient feature extraction of video features.At the same time,this thesis also explores the dynamic multi-modal fusion scheme of micro-video recommendation.Through the coarse / fine-grained dynamic multi-modal information fusion model,the weight of each mode is autonomously learned,so as to adaptively measure the importance of each mode,and assign each weight to the corresponding mode to achieve modal fusion.It is worth emphasizing that the two methods proposed in this thesis can be integrated into the existing multi-modal micro-video recommendation system,thereby improving its recommendation accuracy.In addition,this thesis integrates the proposed method into the existing recommendation system on two public datasets,and conducts a large number of experiments.The experimental results show that the performance of the existing multimodal micro-video recommendation methods can be further improved by using the modal coding and fusion strategy in this thesis.These experimental results further confirm that the proposed method can significantly improve the accuracy micro-video recommendation,thus providing better recommendation services for micro-video platforms.In summary,this study is devoted to solving the problem of micro-video personalized recommendation,especially focusing on user and video representation based on dynamic multi-modal information fusion.The purpose is to construct a more accurate personalized micro-video recommendation algorithm.By applying the method proposed in this study,micro-videos can be more accurately recommended to interested users,thereby improving the platform ’s revenue and user experience,and also improving the efficiency of video distribution and acquisition.In addition,the dynamic multi-modal information fusion mechanism studied in this thesis is not only suitable for micro-video recommendation,but also can be extended to other micro-video analysis tasks,such as micro-video classification and micro-video retrieval.This method helps to better understand and characterize micro-video content,improve the accuracy of these tasks,and provide better support services for video platforms.
Keywords/Search Tags:Multi-modal micro-video recommendation, preference-aware multimodal representation, dynamic multimodal fusion
PDF Full Text Request
Related items