| With the increasingly serious problem of information overload,it is becoming more and more difficult for users to find the content they are interested in facing the massive and complex information on the Internet.For this reason,the deep learning-based recommendation systems assist users by suggesting personalized items that best fit their needs and preferences.It has become an important and popular research field of machine learning.Since the deep learning-based methods cannot continuously update the strategies during the interactions and maximize the expected long-term cumulative reward from users,reinforcement learning-based methods were proposed.However,the existing reinforcement learning-based recommendation methods still have difficulties in multi-behavior recommendation scenarios and have low-quality embedding of state information.To this end,this work improves the reinforcement learning-based recommendation from a perspective of supervision signals.The main contributions of this paper are as follows:In order to cope with the complex interactive recommendation environment with multiple behaviors,a multi-task reinforcement learning-based multi-behavior recommendation method MTRL4Rec(Multi-Task Reinforcement Learning for multi-behavior Recommendation)is proposed,which uses a modular network and a task routing network to solve the multi-task problem in multi-behavior recommendation with reinforcement learning.We further provide a new version of MTRL4Rec based on the DDQN reinforcement learning algorithm,namely MTRL4Rec-DDQN,which further improves the performance and stability of the model.Experimental results show that users indeed have different preferences on different behaviors,and our MTRL4Rec model outperforms state-of-the-art models in the multi-task multi-behavior recommendation scenarios.Experiments also verify that MTRL4Rec-DDQN has better performance and stability.In order to improve the quality of state representation in reinforcement learning-based recommendation,this paper introduces a self-supervised signal combined with the DDQN reinforcement learning method,and proposes a self-supervised state representation enhancement method.We propose the S3 A-RLRec method(Self-Supervised State Representation Augmentation for Reinforcement Learning Recommendation System),which uses a selfsupervised approach to improve the quality of the model’s representation of states.In addition,a hierarchical joint decision-making method considering category value is introduced into the model,and item category fine-tuning is adopted to further enhance the flexibility in hierarchical joint decision-making.Experiments have verified that the method has good performance.Ablation experiments further validate the significance of each part in the model.We study the offline training algorithm through reinforcement learning,and use the accumulated user interaction offline data to train or pre-train our reinforcement learningbased recommendation model.We propose offline training algorithms for general scenarios and multi-behavior and multi-task scenarios respectively,and pre-train initialization for the MTRL4Rec and S3A-RLRec methods proposed above.The experimental results show that offline training on offline data can effectively bring better model initial parameters,and the model based on the pre-training results can converge faster in the subsequent online learning. |