Research Of Game Intelligence Based On Improved Policy Gradient Method

Posted on:2019-02-08

Degree:Master

Type:Thesis

Country:China

Candidate:S L Zhang

Full Text:PDF

GTID:2428330566987565

Subject:Computational Mathematics

Abstract/Summary:

PDF Full Text Request

At present,researchers mostly focus on the value-based reinforcement learning algorithm represented by Deep Q Network(DQN),ignoring the policy-based approach that is more solid in theory and more intuitive in the update process.In this paper,through research and consideration of algorithms such as Reinforce and analysis of advantages and disadvantages,the 2ER-Reinforce algorithm of Reinforce algorithm that integrates entropy regularization and experience replay technique is proposed,and the effect of different important hyperparameters on experimental results is discussed.Finally,the experimental results including the network weights are analyzed visually,and the performance of the 2ER-Reinforce algorithm is improved by training an agent that simulates human playing Atari games.The main work of this paper is as follows:(1)Introduce the research background and significance of deep reinforcement learning,and list its application in games and commerce;(2)Outline the concept of Markov Decision Process and formulate a framework for strengthening research.Focusing on the Bellman equation and dynamic programming solution method in model learning.The policy improvement methods of policy iteration and value iteration are derived,which provides theoretical basis for model-free learning;(3)Briefly introduce the basic knowledge of model-free learning in reinforcement learning,which mainly involves value-based algorithms.The similarities and differences between Monte Carlo Method and Temporal Difference Learning are compared.Sarsa algorithm,Q-Learning algorithm and DQN algorithm improved by Q-Learning algorithm are introduced;(4)Put forward the improved 2ER-Reinforce algorithm and apply it to the video game field.The success of the experiment proves that the 2ER-Reinforce algorithm has practical significance.Firstly,the advantages and disadvantages of Reinforce algorithm are analyzed,and the 2ER-Reinforce algorithm is put forward based on experience replay and entropy regularization technique.Atari game Pong is used as a test environment to compare the performance of four algorithms.The influence of the value on the training effect is discussed.The performance of the game intelligence and the visual network analysis of the policy network weights completed by the training are presented.Finally,the thinking modes of algorithm learning and human learning are considered and compared.

Keywords/Search Tags:

Machine Learning, Deep Reinforcement Learning, Policy Gradient, Reinforce, Game Intelligence

PDF Full Text Request

Related items

1	Research On Game Algorithm Of Imperfect Information 3D Video Game Based On Deep Reinforcement Learning
2	Deep Deterministic Policy Gradient Based On Entropy Regularization And Regular Update
3	Research And Application Of Game Artificial Intelligence System Based On Machine Learning Methods
4	Research And Implementation On Game Control Algorithm Based On Deepening Reinforcement Learning
5	Research On Fast Policy Gradient Algorithms Of Reinforcement Learning Based On Adaptive Learning Rate
6	Research On Multiagent Cooperation And Applications Based On Reinforcement Learning
7	Research On Agent Decision-making And Control Based On Deep Reinforcement Learning
8	Theories, Algortihms And Applications Of Policy Gradient Reinforcement Learning
9	Optimization On Deep Reinforcement Learning Based On Policy Gradient
10	Deep Reinforcement Learning Based On Policy Gradient Optimization And Its Application In Agent Control