| With the rise of major e-commerce companies,the goal that major e-commerce platforms have been pursuing is how to convince more customers to buy products on their websites.Recommender systems are widely used as an effective information filtering tool in industry and academia,and this research has become the hottest research area.In recommender systems,modeling user-commodity behavior is crucial for user representation learning.The following two challenges are faced in existing research work:most of the existing sequential recommendation algorithms do not consider the diversity of recommendations and only consider the sequential correlation between historically interacted goods to capture the historical preferences of users.However,since user preferences are time-evolving and diverse in nature,modeling only historical preferences but not understanding the time-evolving trends of preferences may be inferior for recommending complementary or fresh items,thus leaving users in an "information cocoon" and compromising the effectiveness of the recommendation system;in item recommendation,the The actions in deep reinforcement learning can be discrete or continuous,and the action space targeted in training with the DQN algorithm is discrete.However,when the action space in the recommendation system is continuous,because the number of candidates in the recommendation system is often very large,the dimensionality of the continuous action space is very high,and the DQN algorithm only maps the evaluation function of the state action from the discrete space to the continuous space using neural networks,without solving the problem of discrete actions,which makes it difficult to optimize and train the model.To address these two challenges,this paper proposes a recommendation model incorporating future preferences based on value improvement RFPV(Recommender Future Preference Model based on Value Improvement DRL,abbreviated as RFPV).Based on the idea that the future actions that the current user may do are the actions already done by users similar to the current user,i.e.,neighboring users,the future preferences of the user are learned through the historical interaction sequences of the user.The algorithm considers not only the user’s historical preferences but also the user’s future preferences,and uses the DQN algorithm for training to recommend products of interest to the user,taking into account not only the accuracy but also the diversity of recommendations.The RFPP model is based on The RFPP model is implemented based on the Actor-Critic framework and adds Ornstein-Uhlenbeck noise to the Actor network to explore the actions more fully.The RFPP model using the DDPG algorithm not only solves the problem of continuity and high dimensionality of states and actions in complex and variable recommendation problems,but also takes advantage of the problem domain to accelerate the convergence of the model. |