Reinforcement learning(RL)is widely utilized in the study of intersection signal control methods and exhibits excellent performance in improving traffic efficiency and reducing fuel consumption,mainly for individual intersections.However,to achieve the coordination in the multi-intersection arterial traffic signal control(ATSC)system,RL-based control methods must confront the training difficulty of multi-agent systems,like the curse of dimensionality problem,which is aggravated by the delayed-reward property(integral traffic flow state does not change instantly with the signal operation)of the multi-intersection artery.In this paper,the Delayed Reward’s Multi-Agent Arterial Signal Control(DMAS)method has been proposed.Considering the topological characteristic of the communication network of the ATSC system in the CV environment,DMAS adopts the multi-agent actor-critic training regimen referring to MADDPG,wherein single signal controllers play the role of actor agents,and the center controller of the artery operates as an integrated valuator with multiple critic agents.Furthermore,considering the delayed reward property,we ameliorate the MADDPG by embedding the return decomposition module whereby DMAS transfers the delayed reward into immediate reward.We introduce a dynamic delayed reward prediction model to implement the information contribution analysis from RUDDER.The total delayed reward is assigned to each step according to the prediction value difference between adjacent steps.The simulation results show that the average reward of MADDPG increased by 18% compared with DDPG and 3%further increased by the reward decomposition module.Meanwhile,the line chart of train procession indicates the better stability of performance. |