Traffic jam is an inevitable problem in the process of urbanization.There are many measures to alleviate traffic jams,among which optimizing the control mode of signal lights is the most economical and efficient one.This paper studies the application of deep reinforcement learning algorithm in traffic signal scheduling problem.First of all,the experimental environment and reinforcement learning model are established in the road simulation software,and the necessary state space,action space and reward function are designed.Then,two kinds of deep reinforcement learning algorithms,such as value function algorithm and value function algorithm,are applied to the training of the proposed model,and many improvements are made on this basis.the experimental results show that the improved algorithm can effectively improve the traffic scheduling ability of traffic lights.The specific work is as follows:First of all,the interface between the model and the environment is designed,and on this basis,the simulation environment is built.The state is designed as the intuitive feature input of the environment,the action is designed as the next phase to be changed,and the reward is designed as the difference in the average waiting time of the vehicle before and after the moment.When calculating the reward function,the hyperbolic tangent function is designed to improve the stability.Road simulation scene is divided into two types: simulation scene and real scene.The simulation scene simulates the specific traffic flow,and the real scene restores the recorded real traffic flow.Secondly,the application of value function algorithm in traffic signal scheduling problem is studied,and the advantages and disadvantages of D3QN(Dueling Double Deep Q-learning)in traffic signal scheduling problem are expounded.Considering that the random sampling of algorithm recall clusters does not accord with the characteristics of traffic signal scheduling problem,a recall cluster extraction method based on state counting is proposed,and the advantages of the improved algorithm are demonstrated by small-scale principal component analysis experiments.Considering that the action space sampling of the algorithm ignores the importance of the suboptimal solution,an action space sampling based on Boltzmann distribution is proposed,and the rationality of the improvement is proved by simple demonstration.The hyperparameter of D3 QN algorithm is introduced.The experimental results of different hyperparameters are compared in random scenes,and the relatively optimal hyperparameter values are determined.After that,comparative experiments are carried out on the improved algorithm and fixed phase strategy in simulated scene and real scene respectively.The simulation results show that the improved D3 QN algorithm reduces the average vehicle waiting time from 24.76 s to 10.48 s,by 57.7%,the average vehicle queue length from 15.99 m to 13.17 m,by 17.6%,and the average vehicle waiting time in real scenes by 53.4% from 33.03 s to 17.65 s.Finally,the application of policy gradient algorithm in traffic signal scheduling problem is studied,and the advantages and disadvantages of PPO(Proximal Policy Optimization)in traffic signal scheduling problem are expounded.Considering that there are few researches on the sampling method of this algorithm,four different batch sampling methods are discussed,and a sampling method of putting back and repeating is proposed on the basis of the original algorithm.Considering the single cutting mode of the algorithm,a multi-segment cutting mechanism is proposed.The necessity of improvement is illustrated by small-scale test experiments in Gym environment.The hyperparameter of PPO algorithm is introduced.The experimental results of different hyperparameters are compared in random scenes,and the relatively optimal hyperparameter values are determined.For the improved algorithm and fixed phase strategy,comparative experiments are carried out in the simulated scene and the real scene respectively.The simulation results show that the improved PPO algorithm reduces the average vehicle waiting time from 101.97 seconds to 71.06 seconds,29.2%,the average vehicle queue length from 20.84 m to 18.29 m,16.4%,and the real scene average vehicle waiting time from 65.42 s to 57.51 s,a reduction of 12.1%. |