Font Size: a A A

Research On Autonomous Driving Strategy Algorithm Based On Value Distribution Reinforcement Learning

Posted on:2024-09-29Degree:MasterType:Thesis
Country:ChinaCandidate:K WangFull Text:PDF
GTID:2542307100495324Subject:Master of Electronic Information (Professional Degree)
Abstract/Summary:PDF Full Text Request
With the continuous improvement of China’s comprehensive national power and the acceleration of modernization,the number of cars and the driving population in China have been increasing year by year,ranking first in the world.Although cars bring great convenience to people’s lives,they also bring social problems such as energy waste,air pollution and traffic accidents,and although traditional assisted even systems can reduce traffic accidents to a certain extent,the problem of human-caused safety hazards is not fundamentally solved,so the emergence of autonomous driving technology has become an inevitable trend.With the rise of artificial intelligence and the rapid development of deep learning and reinforcement learning technologies,deep reinforcement learning,which combines the advantages of both technologies,provides a new direction for the development of autonomous driving technology.In the traditional value function-based reinforcement learning algorithm,the problem of value information loss arises.To solve this problem,in value distribution reinforcement learning,the state action value function is modeled as a distribution,i.e.,the value distribution obtained by taking a certain action in a certain state.By updating and iterating these distributions,the value likelihood of the action can be effectively guaranteed,thus improving the learning performance.In this paper,based on the value distribution reinforcement learning algorithm Quantile Regression DQN(QR_DQN),a prioritized experience replay mechanism is introduced to improve the probability of extracting higher learning value experiences,and a multi-step experience replay mechanism is used to make the algorithm converge faster.This improved algorithm is named W Quantile Regression DQN algorithm(WQR_DQN).In addition,this paper constructs a self-driving policy model with multiple network structure inputs,in which the CNN network acquires the on-board camera data and the LSTM network acquires the on-board radar perception data to comprehensively acquire the state information of the environment where the vehicle is located.This paper also constructs three key factors of reinforcement learning,namely the state space of the executive subject,the action space of the executive subject,and the reward function of the action,and builds an autonomous driving simulation platform to simulate the performance of different algorithms on the platform.Through several experimental comparisons,the WQR_DQN algorithm proposed in this paper shows performance improvement over the QR_DQN algorithm.When comparing the operation of the trolley trained by three different self-driving strategy models with different sensor inputs,the model with camera radar combined with the input shows higher environment perception,especially in the presence of fog and noise interference.The experimental results show that the optimized value distribution reinforcement learning model algorithm can improve the accuracy and convergence speed of the model with better performance when building the self-driving policy model.
Keywords/Search Tags:Value distribution reinforcement learning, Multi-step return, Priority experience playback, Autonomous driving decision
PDF Full Text Request
Related items