Font Size: a A A

Research Of Game Intelligence Based On Improved Policy Gradient Method

Posted on:2019-02-08Degree:MasterType:Thesis
Country:ChinaCandidate:S L ZhangFull Text:PDF
GTID:2428330566987565Subject:Computational Mathematics
Abstract/Summary:PDF Full Text Request
At present,researchers mostly focus on the value-based reinforcement learning algorithm represented by Deep Q Network(DQN),ignoring the policy-based approach that is more solid in theory and more intuitive in the update process.In this paper,through research and consideration of algorithms such as Reinforce and analysis of advantages and disadvantages,the 2ER-Reinforce algorithm of Reinforce algorithm that integrates entropy regularization and experience replay technique is proposed,and the effect of different important hyperparameters on experimental results is discussed.Finally,the experimental results including the network weights are analyzed visually,and the performance of the 2ER-Reinforce algorithm is improved by training an agent that simulates human playing Atari games.The main work of this paper is as follows:(1)Introduce the research background and significance of deep reinforcement learning,and list its application in games and commerce;(2)Outline the concept of Markov Decision Process and formulate a framework for strengthening research.Focusing on the Bellman equation and dynamic programming solution method in model learning.The policy improvement methods of policy iteration and value iteration are derived,which provides theoretical basis for model-free learning;(3)Briefly introduce the basic knowledge of model-free learning in reinforcement learning,which mainly involves value-based algorithms.The similarities and differences between Monte Carlo Method and Temporal Difference Learning are compared.Sarsa algorithm,Q-Learning algorithm and DQN algorithm improved by Q-Learning algorithm are introduced;(4)Put forward the improved 2ER-Reinforce algorithm and apply it to the video game field.The success of the experiment proves that the 2ER-Reinforce algorithm has practical significance.Firstly,the advantages and disadvantages of Reinforce algorithm are analyzed,and the 2ER-Reinforce algorithm is put forward based on experience replay and entropy regularization technique.Atari game Pong is used as a test environment to compare the performance of four algorithms.The influence of the value on the training effect is discussed.The performance of the game intelligence and the visual network analysis of the policy network weights completed by the training are presented.Finally,the thinking modes of algorithm learning and human learning are considered and compared.
Keywords/Search Tags:Machine Learning, Deep Reinforcement Learning, Policy Gradient, Reinforce, Game Intelligence
PDF Full Text Request
Related items