Font Size: a A A

Asynchronous Generalized Advantage Actor-critic And Application In Automatic Driving

Posted on:2020-01-21Degree:MasterType:Thesis
Country:ChinaCandidate:H L ZhaoFull Text:PDF
GTID:2492306305499304Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
This paper first introduces the history of automatic driving and the theory of deep reinforcement learning,and then proposes the Asynchronous Generalized Advantage Actor-critic(G-A3C)algorithm based on policy gradient and applies it to the decision-making of automatic driving.Policy gradient can directly optimize the parameters of nonlinear function approximators(such as neural networks)by maximizing the cumulative reward.There may be two problems in policy gradient:(1)the samples do not conform to independent identical distributions and have temporal correlations,(2)and the policy gradients have high variance.In solve to the first problem,this paper adopts a parallel strategy,that is,to make the agents interact with different environment instances at the same time,decouple the temporal correlation,and use the asynchronous gradient descent to optimize the nonlinear function approximator stably.For the second problem,through the theoretical analysis of policy gradient,it is considered that the advantage function in the actor-critic algorithm can reduce the gradient variance,but because of value estimation,the bias will increase while the variance reduces.In this paper,the advantage function in the actor-critic is evaluated in the multi-step,and the variance is reduced under the premise that the bias is controlled within a certain range.Based on G-A3C,this paper uses CARLA as the simulation platform for automatic driving,and carries out the decision-making simulation of automatic driving.The input of the policy network are the image collected by the camera sensor and the information of the car itself,and the output is a control signal.In order to extract image features better,residual network is used as the image feature extraction network,and group normalization is used to accelerate the convergence of the network for asynchronous gradient descent.At the same time,this paper uses the recurrent neural network to solve partially observable problems.For the specific task of autonomous driving,this paper designs reward functions for speed,steering,collision,and intrusion.Finally,simulation on CARLA platform proves the validity of the proposed G-A3C in automatic driving decision-making.
Keywords/Search Tags:Deep reinforcement learning, Generalized advantage function, Asynchronous advantage actor-critic, Asynchronous generalized advantage actor-critic, Automatic driving
PDF Full Text Request
Related items