Font Size: a A A

Research On Autonomous Driving Technology Based On Inverse Reinforcement Learning

Posted on:2020-01-12Degree:MasterType:Thesis
Country:ChinaCandidate:K LiuFull Text:PDF
GTID:2392330590474238Subject:Control engineering
Abstract/Summary:PDF Full Text Request
With the development of machine learning algorithms,autonomous driving technology continues to move forward,which will have a significant impact on future urban traffic.Decision making and control algorithms as its core module,including expert rule base and behavioral cloning,which have weak generalization ability and are not suitable for complex scenes.Reinforcement learning algorithm has exploration ability and can optimize a policy with better generalization.However,there are problem of high cost of exploration and hard to determine the reward function in the state of the art of reinforcement learning.In order to solve the above problem,this dissertation presents a modified policy optimization algorithm,utilizes the inverse reinforcement learning algorithm to learn the optimal reward function,and applies it to the autonomous driving decision task.For the problem of high cost of exploration in reinforcement learning decision making algorithm,this dissertation presents a deep deterministic policy gradient algorithm combined with expert supervised loss.The combined sampling mechanism is utilized to sample the training data from the expert demonstrations and the self-generated data.For the expert training data,the mean square error of the expert policy and the current policy is designed as the expert supervised loss,and the original policy gradient is combined to optimize the policy.For the self-generated training data,the policy are updated by original policy gradient.On the one hand,expert supervised loss function guides the policy to learn along the direction of the expert policy,on the other hand,it guides the agent to learn in self exploration.The policy learning speed,training process volatility and optimal policy are contrasted and analyzed in the open racing car simulator,the autonomous driving decision simulation examples are given to show the effectiveness of the proposed algorithm.To solve the problem that the reward function is difficult to construct empirically,the maximum entropy inverse reinforcement learning algorithm is adopted to learn the optimal reward function.By analyzing expert demonstration data,this dissertation extracts important state features,and constructs the reward function in a linear combination form.The probability model is established for the expert demonstration data based on the principle of maximum entropy,and the possibility of maximizing the emergence of the expert trajectory is taken as the optimization goal,the parameters of the reward function are iteratively optimized.Utilizing the learned reward function as the optimal reward function,we adopt the proposed policy optimization algorithm to optimize the policy.The policy learning speed,training process volatility,optimal policy and generalization ability are analyzed in detail,simulation examples demonstrate that the optimal reward function is effective.
Keywords/Search Tags:inverse reinforcement learning, autonomous driving, expert demonstration data, expert supervised loss, reward function
PDF Full Text Request
Related items