Research On Algorithms Of Gradient Temporal Difference Evaluation Network For Deep Reinforcement Learning

Posted on:2021-03-11

Degree:Master

Type:Thesis

Country:China

Candidate:Y Zheng

Full Text:PDF

GTID:2392330605471677

Subject:Control Science and Engineering

Abstract/Summary:

PDF Full Text Request

Reinforcement learning is one of the latest research advances in the field of machine learning in recent years.The purpose of reinforcement learning is to make the agent learn how to adopt the optimal behavior in the corresponding observation environment by interacting with the environment.The behavior can be determined by the rewards given by the environment.This paper proposes an Actor-Critic network algorithm with a "frozen point" mechanism,uses Deep Q-Learning algorithm and Policy Gradient algorithm to train the network parameters,and improves the shortcomings of the policy gradient algorithm,such as the slow convergence speed and weak stability.By improving the updating method and loss function of the network parameters in the evaluation network,and adding the experience replay mechanism in the Actor network,the improved algorithm achieves faster network parameter training speed and better stable performance.The main research contents and contributions of the paper can be summarized as the following two aspects:1.This paper firstly proposes an Accelerated Linear Approximation method(ALA-AC)based on the framework of the Actor-Critic network algorithms.The ALA-AC algorithm changes the previous method of updating the deep neural networks' parameters.The mode of parameters'update improves the convergence speed and stability of the algorithm.Through a large number of experimental results,compared with the conventional Actor-Critic network algorithm,it is proved that the ALA-AC algorithm has higher learning efficiency and faster convergence speed.2.Based on the ALA-AC algorithm,the loss function in the process of parameters' update is changed from Mean Square Error to Mean Square Projected Bellman Error,which reduces the training error of network's parameters to a certain extent.Compared with the experimental effect of the ALA-AC algorithm,the improvement of loss function makes the algorithm have better performance.The conventional Actor-Critic network algorithm,the ALA-AC algorithm and the improved algorithm based on ALA-AC are applied to the unmanned vehicle path planning.Through repeated verification and a large number of comparative experiments,the two improved algorithms show more excellent effect.

Keywords/Search Tags:

reinforcement learning, policy gradient, actor-critic network, linear approximation, mean square projected bellman error

PDF Full Text Request

Related items

1	Study On Adaptive Pid Control Strategy Based On Actor-critic Learning
2	Research On Reinforcement Learning Control Method Of Micro Air Vehicle
3	Research And Implementation Of Actor-Critic Algorithm Model For Aircraft Autonomous Landing
4	Actor-Critic Reinforcement Learning And Applications To Automatic Ship Berthing
5	Research On Bidding Strategy Of Generators In Electricity Market Based On Asynchronous Advantage Actor-Critic Reinforcement Learning
6	Research On Electric Vehicle Routing Problem Based On Reinforcement Learning
7	Asynchronous Generalized Advantage Actor-critic And Application In Automatic Driving
8	Application Of Neural Network Controller In Thermal Power Plant
9	Researches On Adaptive Critic Learning Control Approaches For Intelligent Driving Vehicles
10	A Resequencing Method For Automobile Painting With Rework Based On Reinforcement Learning