Font Size: a A A

Research On Algorithms Of Gradient Temporal Difference Evaluation Network For Deep Reinforcement Learning

Posted on:2021-03-11Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhengFull Text:PDF
GTID:2392330605471677Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Reinforcement learning is one of the latest research advances in the field of machine learning in recent years.The purpose of reinforcement learning is to make the agent learn how to adopt the optimal behavior in the corresponding observation environment by interacting with the environment.The behavior can be determined by the rewards given by the environment.This paper proposes an Actor-Critic network algorithm with a "frozen point" mechanism,uses Deep Q-Learning algorithm and Policy Gradient algorithm to train the network parameters,and improves the shortcomings of the policy gradient algorithm,such as the slow convergence speed and weak stability.By improving the updating method and loss function of the network parameters in the evaluation network,and adding the experience replay mechanism in the Actor network,the improved algorithm achieves faster network parameter training speed and better stable performance.The main research contents and contributions of the paper can be summarized as the following two aspects:1.This paper firstly proposes an Accelerated Linear Approximation method(ALA-AC)based on the framework of the Actor-Critic network algorithms.The ALA-AC algorithm changes the previous method of updating the deep neural networks' parameters.The mode of parameters'update improves the convergence speed and stability of the algorithm.Through a large number of experimental results,compared with the conventional Actor-Critic network algorithm,it is proved that the ALA-AC algorithm has higher learning efficiency and faster convergence speed.2.Based on the ALA-AC algorithm,the loss function in the process of parameters' update is changed from Mean Square Error to Mean Square Projected Bellman Error,which reduces the training error of network's parameters to a certain extent.Compared with the experimental effect of the ALA-AC algorithm,the improvement of loss function makes the algorithm have better performance.The conventional Actor-Critic network algorithm,the ALA-AC algorithm and the improved algorithm based on ALA-AC are applied to the unmanned vehicle path planning.Through repeated verification and a large number of comparative experiments,the two improved algorithms show more excellent effect.
Keywords/Search Tags:reinforcement learning, policy gradient, actor-critic network, linear approximation, mean square projected bellman error
PDF Full Text Request
Related items