| Traffic congestion caused by the increasing number of cars is a serious problem that needs to be solved in the field of smart transportation research.The intersection is the gathering place of vehicles in all directions,and the reasonable allocation of the timing scheme of traffic lights is an important factor affecting the traffic situation.Therefore,we need the traffic lights to learn to interact with the traffic environment to obtain a reasonable timing scheme.Adjusting the timing scheme of the signal lights according to the real-time traffic flow at the intersection can greatly reduce the waiting time,delay time and queue number of vehicles.Deep reinforcement learning has become an effective method to solve the problem of traffic light timing due to its powerful perception and decision-making capabilities.However,due to external environmental disturbances,internal parameter fluctuations,model structural defects and different policy mechanisms,the deep reinforcement learning model has problems such as parameter uncertainty,convergence and divergence,and poor exploration ability,which limit its use in traffic light timing.further development of the system.Based on this,the main work content and specific research results of this paper are as follows:(1)Aiming at the problem of parameter uncertainty in the application of Deep Q-learning Network(DQN)to single intersection traffic environment,a single intersection traffic signal timing method based on the combination of DQN and Extended Kalman Filter(EKF)is proposed.In this method,the uncertainty parameter value of the estimated network is used as the state variable,the target network value including the parameter uncertainty is used as the observation variable,and the EKF system equation is constructed by combining the process noise,the estimated network value including the uncertainty parameter and the system observation noise.Through the iterative update solution of EKF,the optimal estimated value of the real parameters in the DQN model is obtained,So as to solve the problem of poor timing strategy of DQN in single intersection traffic environment,and improve the performance of the timing system.The experimental results show that the method is suitable for different traffic scenarios and different traffic flows,and can solve the parameter uncertainty problem in the DQN model,and obtain the optimal timing scheme to alleviate traffic congestion.(2)Aiming at the problems of poor ability of convergence,divergence and exploration when most deep reinforcement learning is applied to multi intersection traffic environment,a multi intersection traffic signal timing method based on Soft Actor-Critic(SAC)is proposed.SAC is to add an entropy value item to measure the randomness of the strategy to the objective function of traditional reinforcement learning,and maximize the cumulative expected reward and entropy value item to improve the exploration ability of the model,so that the system model can learn multiple optimal timing schemes,to avoid falling into a local optimum or failing to converge by repeatedly selecting the same timing scheme.At the same time,the strategies of low reward value are abandoned,reducing the amount of data storage and the complexity of sampling,and the training speed is accelerated,so that SAC can be suitable for highly dynamic traffic environment.The comparative experimental results show that this method can effectively solve the problems of poor ability of convergence,divergence and exploration of most deep reinforcement learning algorithms,enhance the stability of the system,and effectively improve the traffic efficiency of vehicles. |