Font Size: a A A

Conditional Affordance And Data Aggregation Reinforcement Learning Methods For Urban Driving Tasks

Posted on:2022-04-04Degree:MasterType:Thesis
Country:ChinaCandidate:P HuFull Text:PDF
GTID:2492306569994589Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Urban driving task is one of the most promising fields at present,which has great commercial value and scientific value.Imitation learning is one of the main methods to solve this task,but it requires a lot of labeled training data.At the same time,in order to make the vehicle have the ability to deal with extreme situations(such as the vehicle is about to crash),a large amount of training data in extreme situations is needed,which requires a lot of manpower and material resources and it is full of difficulties.But reinforcement learning uses a reward system to get the vehicle to explore and learn in the environment so we don’t need any labeled training data,and reinforcement learning can use neural network fitting strategy and value estimation to sense and understand environment states,thus it can achieve end-to-end autopilot and navigation control.As reinforcement learning uses reward mechanism for learning,the reward signal is very weak compared with the direct supervision signal for actions,and the high dimensional environmental observation will limit the capacity of the replay buffer,so reinforcement learning method is difficult to learn neural networks with large scale and many parameters.In addition,experience that palys a more important role in learning,such as the vehicle’s imminent collision and the vehicle’s need to turn at the intersection,account for a very low proportion in the total data generated by the interaction between the agent and the environment,which results in the agent’s inability to learn key strategies for a long time.In order to solve the problem that reinforcement learning in this task is difficult to learn complex network,this paper introduces the conditional affordance into reinforcement learning.We trained a reduction encoder for the environment state in advance and spliced the dimension reduction result with discrete state information which has prior knowledge,by using this result as the input of the algorithm,the complexity of the environment state is reduced,the capacity of the replay buffer is increased and the reinforcement learning method only needs to learn a small network.The experimental results show that the reinforcement learning method based on conditional affordance can effectively reduce the difficulty of learning,and the reinforcement learning method based on value iteration has a significant advantage over the reinforcement learning method based on strategy gradient in the training difficulty and convergence speed.Aiming at the problem that the proportion of efficient experience is very low and the key strategies cannot be learned for a long time,this paper proposes to use distributed priority experience replay to increase the diversity of data generated by the interaction between agents and the environment.Based on the method of data aggregation,the key states are collected according to the task type and current strategy performance of the vehicle and given appropriate priority,and the experience sampling from the key state set is used for reinforcement learning training.The experimental results show that our method surpasses the existing reinforcement learning methods in the monocular camera autonomous driving test benchmark and approaches the imitation learning method with the highest success rate.In the most difficult driving situations,our approach has generalization capabilities that exceed current best methods,and results in fewer crashes and red light runs and safer driving...
Keywords/Search Tags:automatic drive, deep reinforcement learning, conditional affordance, distributed priority experience replay, data aggregation
PDF Full Text Request
Related items