Font Size: a A A

Research On Online Recommendation Based On Contextual Combinatorial Bandit Algorithm

Posted on:2023-01-19Degree:MasterType:Thesis
Country:ChinaCandidate:H X HanFull Text:PDF
GTID:2568307076985479Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Recommender systems play a crucial role in the Internet age of information overload,and the recommendation process need make a trade-off between two goals:exploring new items to maximize user satisfaction and exploiting those already interacted with to match user interests.This problem is widely recognized as the exploration-exploitation(EE)dilemma.The bandit algorithm has proven to be an effective solution,and thus has been widely studied and applied to the recommendation field.As the scale of users and items in real-world application scenarios increases,formalizing the online recommendation task as a bandit problem poses three challenges:first,sparse interactions between users and items increases the difficulty of mining user preferences.Second,as items are added to the system,modeling single item as an arm brings the problem of large-scale arms.Third,the widely used Bernoulli reward mechanism does not take full advantage of the rich implicit feedback information in the recommender systems.In this paper,we mainly focus on the above problem and propose the dynamic clustering-based contextual combinatorial bandit algorithm(DC~3MAB),which consists of three key components:dynamic user clustering strategy,item partitioning approach and multi-class reward mechanism.Specifically,to accurately mine users’interest preferences from their sparse interaction behavior,dynamic user clustering strategy clusters users with similar preferences into the same cluster,and users located in the same cluster can share the same bandit parameter information.To solve the problem of not being able to grasp the reward distribution information of each arm and the high computational complexity brought by large-scale arms,the item partitioning approach focuses on quickly filtering a few subsets of items from total items based on their current interactions and modeling an item set as an arm.To capture user’s potential preference from the complex interaction behaviors,the Multi-class Reward Mechanism sets different reward weights for different interaction types,which reflects the different preference level for recommended items.Additionally,based on the clustering bandit,we further propose the collaborative combinatorial bandit(CoCoB)algorithm with the idea of two-side bandit to achieve adaptive user clustering.Specifically,user-bandit is based on an improved Bayesian bandit,which models users as arms to explore similarity between users.By setting a similarity probability threshold to judge whether there is a neighboring user of the target user,it adaptively utilizes information about the neighboring users’preferences or personal preferences to assist in recommendation decision.Item-bandit models items as arms and leverages the preference information from user-bandit to provide a list of recommendations at a time to increase recommendation variety.Extensive empirical experiments on three real-world datasets demonstrate the superiority of our proposed DC~3MAB and CoCoB over state-of-the-art bandits.
Keywords/Search Tags:personalized recommendation, contextual multi-armed bandit, exploration-exploitation, collaborative filtering
PDF Full Text Request
Related items