| Interactive recommendation technology is a cutting-edge technology in the field of recommendation.The feature of it is to predict user preferences through the interaction information between the recommendation agent and the user.In real-world scenarios,interactive recommendation technology can not only predict the current personalized needs of users,but also excavate the changing dynamic needs of users through long-term interactions with them.Besides,it is able to provide recommendations that meet users’ changing needs.The interactive recommendation system based on deep reinforcement learning benefits from the feature of reinforcement learning algorithm to obtain the basis for policy improvement from behavioral feedback for it enables the recommendation agent to capture the user’s changing preference information more easily in the real scenario of long-term interaction with the user.Therefore this sort of interactive recommendation system has better recommendation effect than the recommendation system based on traditional methods.However,existing recommendation systems based on deep reinforcement learning mainly use information about the interaction between users and recommendation agents,ignoring the use of basic information about users and items.At the same time,simple deep reinforcement learning algorithms with only a single network are susceptible to biased actions in the process of obtaining user feedback,which will make training difficult to converge and affect the prediction accuracy of recommendation systems.Based on the above problems,this paper proposes a model of user-item collaborative recommendation system based on the Soft Actor-Critic(SAC)algorithm.First,this paper provides a method to embed the basic feature information of users and items in the training process of recommendation agents.The method can use the basic feature information of users and items to train embedding features containing similar information of users and items,so that the recommendation agents embedded with such features can better learn user requirements during the training and optimization process of deep reinforcement learning.Second,to further improve the recommendation performance of the system,an interactive recommendation architecture based on the SAC algorithm is designed in this paper.This architecture utilizes the action value network in the critic part to evaluate the policy network in the actor part,and reduces the impact of biased actions on the network by means of entropy terms with selfrenewal capability.Finally,in order to enable the model to learn the changing needs of users using offline datasets,this paper designs a method to mimic user behavior through offline datasets,enabling recommendation agents to be trained by user feedback behavior even in offline scenarios.Through model comparison experiments,ablation experiments and hyperparameter experiments on the Movie Lens 100 k dataset,this paper demonstrates the effectiveness of the proposed recommendation method based on the SAC algorithm and that of user-item collaborative recommendation method in improving the recommendation agent providing users with recommendation results that meet their personalized needs.What’s more,it further explores the effectiveness of the tailoring dual Q-value learning technique,the soft update method of the target Q network,the embedded feature dimension,the self-updating temperature control factor,and the number of hidden layers in improving the performance of the recommendation system. |