Font Size: a A A

Research Of Unmanned Driving Policy Based On Aggregated Multiple Deep Deterministic Policy Gradient

Posted on:2020-03-12Degree:MasterType:Thesis
Country:ChinaCandidate:J T WuFull Text:PDF
GTID:2382330596964236Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Unmanned driving has become a research hotspot in academia and industry nowadays.Research of unmanned driving helps solve problems existed in human driving,such as frequent traffic accidents,heavy traffic jam and so on,which is of great practical significance.This paper applied deep reinforcement learning(DRL)technique onto unmanned driving with simulated environment,and proposed a method for learning control policy of unmanned-driving car with less learning time and better performance.Deep deterministic policy gradient(DDPG)algorithm operating over continuous space of actions has attracted great attention for reinforcement learning.However,the exploration strategy through dynamic programming within the Bayesian belief state space is rather inefficient even for simple systems.This paper presents an aggregated multiple deep deterministic policy gradient(AMDDPG)algorithm to learn control policy of unmanned-driving car.The proposed method simultaneously trained multiple sub-optimal sub-policies based on structure of multiple DDPG and two different training modes.It then aggregates the multiple sub-policies.During the training process,centralized experience replay technique was exploited to break the correlation of experiences and improve the utilization of training samples.In order to verify the feasibility and validity of using AMDDPG algorithm to learn unmanned driving policy,a simulation system for the study of unmanned driving control policy was built based on AMDDPG and the open racing car simulator(TORCS),overcoming the shortcomings of high cost and poor safety of real car experiment.Additionally,we designed the reward function by multiplying multiple product terms according to our expectations for behavior of the car,such as fast driving,braking before the turn,being close to the central axis of the track and so on.Finally,we carried out simulated experiments of unmanned-driving car and test the performance of aggregated policy.The results of the experiments demonstrate that,compared to DDPG,the proposed AMDDPG algorithm has more stable process for learning unmanned driving control policy,lowing the training time by 56.7%.In addition,it achieves strong generalization ability.The appropriate number range of sub-policies for policy aggregation is analyzed.We find 3-10 sub-policies are better in practical application scenarios.
Keywords/Search Tags:Unmanned Driving, Deep Reinforcement Learning, Deep Deterministic Policy Gradient, Policy Aggregation
PDF Full Text Request
Related items