Font Size: a A A

Research On Multi-agent Deep Reinforcement Learning Algorithms For Traffic Signal Control

Posted on:2022-11-01Degree:DoctorType:Dissertation
Country:ChinaCandidate:S T YangFull Text:PDF
GTID:1482306764460084Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Most cities in the world suffer from increasing traffic congestion,which leads to a series of negative effects on public travel and the development of society as a whole,such as travel delay,vehicle fuel consumption and environmental pollution.Among the many types of factors those cause traffic congestion,signal-controlled intersection is one of the most common types of traffic congestion bottlenecks in the urban traffic environment,so traffic signal control is the core of urban traffic control.Reinforcement Learning(RL)has received extensive attention and research in traffic-signal control area.Although a series of RL-based traffic signal control algorithms have been proposed,limited by the feature representation of traffic networks and the cooperative control between different agents,thus the existing algorithms still have some drawbacks.To alleviate them,based on the existing research,this dissertation focuses on two directions: one is to produce the control policy of each agent based on the learned feature representation of traffic networks;the other is to cooperate the control policies of different agents.The research work of this dissertation is as follows:1)The existing single-agent RL algorithms cannot trade off the deviation-variance well;in addition,some useful key information(such as the distance between adjacent intersections)is ignored,which may lead to non-optimal traffic signal control.To address these drawbacks,we propose a decentralized multi-agent coordination graph algorithm,referred to as Multi-step return and Off-policy Asynchronous Advantage Actor-Critic Graph(MOA3CG)algorithm,which is based on the proposed single-agent Multi-step return and Off-policy Advantage Actor-Critic(MOA2C)algorithm and a coordination graph.MOA3 CG algorithm makes traffic-signal policies based on current traffic states,observation history and other information.In addition,an Adjusting Matrix of Traffic Signal Phase Control(AMTSPC)is proposed,which determines the optimal action selection(i.e.,the optimal traffic signal selection)by considering the distance between adjacent intersections.Experimental results show that MOA3 CG algorithm is better than the state-of-the-art algorithms in terms of multiple traffic performance metrics.2)The existing hierarchical deep RL algorithms either design the latent goals manually or acquire the latent goals from the environments,which may lead to nonoptimal low-level policies.To improve this,a single-agent RL algorithm for learning hierarchical goals is first proposed,referred to as Learned-goal Soft Actor-Critic(LSAC)algorithm,which can automatically learn the optimal latent goals and then uses them at the low-level policy.In addition,the problem faced by the multi-agent framework is the rapid growth of state space due to the increasing number of control agents,a Semidecentralized Feudal Multi-agent(SFM)framework is proposed,which partitions the control area and uses regional agents to cooperate with different local agents.Combining the two methods proposed above,we propose an SFM-LSAC algorithm for multiintersection traffic signal control.Extensive experimental results demonstrate that SFMLSAC algorithm outperforms other state-of-the-art algorithms in terms of multiple traffic performance metrics.3)The special multi-agent settings for certain traffic networks are adopted to generate cooperative traffic-signal policies,but the adopted special multi-agent settings hinder the traffic-signal policies to transfer and generalize to new traffic networks.In addition,the time-varying vehicles traversing the traffic networks cannot be effectively represented.At the same time,the heterogeneous features of different objects in traffic networks cannot be effectively captured.Based on the above observations,we propose an algorithm,referred to as Inductive Heterogeneous Graph Multi-agent Actor-critic(IHG-MA)algorithm,for multi-intersection traffic signal control.IHG-MA algorithm has two features: 1)It conducts representation learning using a proposed inductive heterogeneous graph neural network(IHG),which is an inductive algorithm.But unlike the algorithms based on the homogeneous graph neural network,IHG algorithm not only encodes heterogeneous features of each node,but also encodes heterogeneous structural(graph)information.2)It also conducts policy learning using a proposed multi-agent actor-critic(MA),which is a decentralized cooperative framework.MA framework employs the final embeddings to compute the Q-value and policy,and then optimizes the whole algorithm via the Q-value and policy loss functions.Extensive experimental results show that IHG-MA algorithm outperforms the state-of-the-art algorithms about multiple traffic performance metrics,that is to say,IHG-MA algorithm can be effectively transferred to both synthetic and real-world traffic networks.4)The existing RL algorithms only adopt the traffic-network information of each step(i.e.,short-term information),while long-term information(such as the task of each agent)is ignored,which may lead to non-optimal traffic-signal policies.In addition,these algorithms based on meta-learning cannot effectively tackle diversity among the tasks due to the shared parameters describing the ‘average' source tasks.Based on the above observations,therefore,we propose a MEta Multi-agent Advantage Actor-critic(MEMA2C)algorithm for multi-intersection traffic signal control.ME-MA2 C algorithm has two components: 1)It conducts meta-learning(ME)using a proposed task-neighbor encoder,which is a meta-learning algorithm.ME algorithm encodes both short-term information and long-term information to learn the meta-embedding and meta-knowledge,which helps to generate the optimal traffic-signal policies.2)It also conducts policy learning using a Multi-agent Advantage Actor-Critic(MA2C),which is a decentralized multi-agent framework.MA2 C framework employs the learned meta-embedding and meta-knowledge,and then optimizes the whole algorithm to generate the transferable traffic-signal policies.Extensive experimental results show the superior performance of ME-MA2 C algorithm over the state-of-the-art algorithms in multiple traffic performance metrics.ME-MA2 C algorithm trained in the synthetic-road networks can be effectively transferred to both synthetic and real-world traffic networks with two kinds of traffic flows,and achieves effective traffic signal control.
Keywords/Search Tags:Multi-agent Reinforcement Learning, Heterogeneous Graph Representation Learning, Meta-learning, Cooperative Traffic Signal Control, Mutual Information
PDF Full Text Request
Related items