| With the booming of economic development,the number of urban vehicles is increasing,and urban traffic is becoming more and more stressful,seriously affecting people’s daily travel and increasing the incidence of traffic accidents.Other problems are equally serious,including environmental pollution,economic losses and a decline in overall quality of life.The lack of space and resources to optimize the infrastructure makes it a pressing issue to improve traffic flow and Traffic Signal Control(TSC)within the existing infrastructure.Traditional traffic signal methods are most commonly controlled by fixed-time,driven,or adaptive control methods,which are responsive to traffic conditions but do not fully cope with fluctuating traffic scenarios.In particular,it is not the best choice when the traffic flow is highly saturated.Due to traffic flow’s highly dynamic and complex characteristics,Reinforcement Learning(RL)is commonly employed to deal with these adaptive problems.This thesis investigates the problem of signal control in urban traffic road networks.Firstly,traditional traffic signal methods target individual intersections,but traffic flows at connected intersections are usually affected by each other.This thesis proposes a cooperative control method based on reinforcement learning,which can incorporate the impact of traffic flow from multiple adjacent intersections.Second,although reinforcement learning solves sequential decision-making tasks well,it models usually suffer from weak exploration ability and poor convergence when interacting with the environment during the training process.This thesis proposes imitation or demonstration learning to pre-train the reinforcement learning model by demonstration data for fast convergence and improved performance.Then,to better integrate spatial and temporal features,this thesis proposes a dynamic spatial-temporal graph attention network for traffic signal control to fully exploit the potential spatial-temporal joint relations.Finally,based on meta-learning,this thesis proposes a meta spatial-temporal graph attention network for handling dynamic changing traffic flows to improve traffic signals’ efficiency and save vehicles’ waiting time.The specific research works and contributions are summarized as follows:(1)Traffic Signal Control With Reinforcement Learning Based on Region-Aware Cooperative StrategyIn urban road scenarios,there are usually consecutive intersections connected,and in this case,if traffic congestion occurs at one intersection,then other intersections may also be affected.Therefore,the classic reinforcement learning method usually allows an agent to control a single intersection,which often does not allow cooperative control of multiple intersections.Another approach is to set up a single model and train all intersection samples with this model,which has the disadvantage of requiring a larger model size and generalization ability.None of these methods can achieve cooperative control of multiple intersections,and essentially the traffic flow effects of the connected intersections are not considered.This thesis addresses the problem that single agentbased traffic signal control methods cannot adapt to multiple intersections and proposes a cooperative control method for multiple intersection signals(RACS)on a policybased reinforcement learning algorithm.The method takes the state of neighboring intersections as part of its state while considering the influence of neighboring signal control policies.At the multi-intersection scenarios,RACS reduces the waiting time by 48.9% and 31.0% on the synthetic and real datasets,respectively,compared to the existing method IA2 C.(2)Intersection Structure Independent Traffic Signal Control Method at MultiintersectionTraffic states at intersections often have symmetric or rotational characteristics.If the model can recognize such characteristics,then repeated training can be reduced,based on which a traffic demand modeling scheme independent of the intersection structure is designed in this thesis.In addition,reinforcement learning is weak at the early stage of training when interacting with the environment,and adjusting the learning rate may make the training overfitting or under-fitting.How to solve this problem is what almost all reinforcement learning algorithms need to face.The thesis proposes a demonstration learning-based approach,Ape-X DQf D,which first trains a model using a traditional Self-Organizing Traffic Light(SOTL)method and then collects training data from SOTL.These training data are used as demonstration data to pre-train the reinforcement learning model,which has a strong ability to adapt to the environment after pre-training so that it can quickly reach the convergence state during formal training and shorten the training time.It can also improve the model’s performance and control the traffic signals well.Experiments on three urban datasets confirm that the method proposed in the thesis performs better than the mainstream RL-based method,with faster convergence and the least travel time,averaging 23.9%,23.8%,and 11.6%.(3)Multi-intersection Traffic Signal Control Method Based on Spatial-temporal Feature FusionGraph neural networks differ from convolutional and recurrent neural networks,which are more suitable for tasks with graph structures or graph-like structures and some non-Euclidean space-generated data.Some of the existing graph neural network based signal control methods do not consider the traffic flow state in the past period.In contrast,the actual traffic flow is continuous,and the previous period’s state directly impacts the switching of subsequent signals,so these time-dimensional features should be considered.On the other hand,some models based on spatial-temporal features directly combine temporal features and spatial features without fully exploiting their intrinsic correlation.This thesis proposes a Dynamic Spatial-Temporal Graph Attention Network(Dyn STGAT)for traffic signal control.It uses Temporal Convolutional Network(TCN)and Graph Attention Network(GAT)to obtain the spatial-temporal features in the past time,LSTM and GAT are fused to obtain the spatial-temporal features of the current moment.A DQN network is used to predict the state of the signal light at the next time step.The experimental results show that the travel time is 13.8% less than that of the Co Light method on the synthetic dataset(with configuration 2).Moreover,on the real dataset,the travel time is 7.2% and 3.6% less than Co Light,respectively.(4)A Meta-learning Based Method for Dynamic Multi-intersection Traffic Signal ControlGraph neural networks are often used in tasks with graph structures,but the properties of the nodes in these graphs are usually fixed.However,in reality,more often than not,the node properties are constantly changing,such as social networks.How to deal with such dynamically changing scenarios is a critical issue.In traffic signal control,if an intersection is considered a node and the connected intersections as its neighboring nodes,it is obvious that the properties of these nodes are constantly changing because the traffic flow is dynamic.This thesis proposes Meta-learning Based Spatial-Temporal Graph Attention Network named Meta STGAT to address this situation.The meta-knowledge learning module in Meta STGAT can dynamically learn the weights among nodes based on the changing node features.Updating this weight optimizes the graph attention network so that the whole model can obtain better results.The meta-knowledge learning module consists of a two-layer fully-connected network,and the experimental results show that the performance of Meta STGAT is better than that of the Spatial-Temporal Graph Attention Network(STGAT)alone.On four synthetic and two real-world datasets,Meta STGAT reduces the travel time by 12.23%,19.30%,13.84%,10.91%,8.24%,and 8.74%,respectively,over the graph network method Co Light.In summary,this thesis explores the traffic signal control problem from a simple connected multi-intersection scenario to a scenario with heterogeneous intersection structure to a dynamically changing and complex multi-intersection scenario.This thesis proposes four reinforcement learning-based multi-intersection signal control methods,which effectively improve traffic efficiency,increase the traffic flow at the intersection,shorten the vehicle travel time,and reduce the vehicle waiting time. |