Font Size: a A A

Study On Driving Policy Of Autonomous Unmanned System Based On Deep Reinforcement Learning

Posted on:2021-02-04Degree:MasterType:Thesis
Country:ChinaCandidate:Z H LiFull Text:PDF
GTID:2492306470462964Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of society and economy,the car ownership in China is gradually increasing,which triggers serious issues of environmental pollution and traffic safety.Autonomous driving,as a multidisciplinary technology,make driving safer and more efficient through autonomous and intelligent control policy,which is significance to improve the traffic safety and traffic efficiency.Reinforcement learning is a machine learning method based on rewards,which is widely used in policy learning.It can be extended to a series of more complex and realistic decision-making problem after the proposal of deep reinforcement learning.Autonomous driving needs to deal with a variety of complex and changeable traffic scenarios,so the application of deep reinforcement learning in autonomous driving has a broad application prospect.In this paper,we conduct a study on driving policy of autonomous unmanned system.We apply deep reinforcement learning to multi driving task based on virtual driving simulation.The main research content is as following:In this paper,we build an interactive RL framework with TORCS and propose an end-toend autonomous driving policy method based on deep reinforcement learning to solve the lane keeping task and overtaking task.Since the traditional RL method requires a long time on training,we propose a policy-based reinforcement learning algorithm on random policy which is based on PPO.We introduce a curiosity driven method call RND to generate intrinsic reward signal to enable the agent to explore its environment,improving the efficiency of exploration of the agent.It can make the policy learn faster.We introduce an auxiliary critic network on the original Actor-Critic framework,called clipped-dual-critic-network.We choose the lower estimate which is predicted by the dual critic network when the network update to avoid the overestimation bias.We also combine the GAE to reduce the variance of policy gradient estimates at the cost of some bias.Experiment result shows that our improved PPO algorithm can improve the training efficiency and control performance in driving task.The proposed method is applied to multi driving task based on TORCS environment and compared with the DQN,DDPG,TD3 and PPO.According to the different characteristics of lane keeping task,lane keeping task with following speed and overtaking task,we define the state input and action,and design the reward function.In the experiment of lane keeping and lane keeping with following target speed,the experiment result shows that our proposed method can improve the model’s training performance.Our model can complete the first task and learn the policy faster.Moreover,the control performance of our model is better and stable from the distance error and angle error.Our method also has a good generalization ability,the model can complete the task on multiple unknown tracks and maintain good control performance.Finally,we set up various opponents and define the new state input and reward function,then we apply our method to overtaking task,the result shows that our model can achieve overtaking task for opponents’ car in experiment.
Keywords/Search Tags:Autonomous Driving, Deep Reinforcement Learning, Driving Policy, TORCS
PDF Full Text Request
Related items