| Due to the rapid development of urban traffic and the increase of urban road’s functionand density, foreign scholars have began the research of adaptive traffic signal control.Adaptive traffic signal control is a potential approach to alleviate congestion. Urbantransportation system has the characteristics of nonlinearity, dynamics, uncertainty, fuzziness,and complexity etc., so the traditional adaptive traffic signal control system and intelligentcontrol method cannot be adapted to the variation of traffic flow in a certain extent andseriously rely on the traffic model although they have obtained certain achievements.Reinforcement learning(RL) needs less the mathematical model and the priori knowledge ofexternal environment, so it can achieve good learning performance in large space andcomplicated nonlinear system. Then, agent-based RL proposed by many scholars will havebroad prospects for development in the adaptive traffic signal control. The study employs atraffic signal control agent for each signalized intersection. On the analysis of standardreinforcement learning’s process and effectiveness under adaptive traffic signal control,applications of several typical reinforcement learning algorithms on the adaptive trafficcontrol have been studied, including the distributed Nash Q-learning algorithm,multi-interactive history learning algorithm and policy gradient ascent algorithm. The focusand the innovation achievements of the thesis are as follows:(1) Construction of the system structure model for intersection’s traffic signal controlagentDue to more interference, dynamic and uncertainty of the intersection’s traffic flow, thehybrid system structure model for intersection’s traffic signal control agent was establishedby the fusion of cognitive and reactive agent structure based on agent’s BDI theory modelaccording to the "perception-cognition-behavior" mode.(2)The realization of standard reinforcement learning algorithm toward adaptive trafficsignal controlUse a independent standard reinforcement learning method such as Q-learning forintersection traffic signal control,and the realization process of Q-learning algorithm wasanalyzed. Compare with traditional timing control method, the Q-learning was effective.Aimed at dimension disaster problems of independent standard reinforcement learning algorithm,the independent standard reinforcement learning algorithm was extended byintroducing coordination mechanism. Compared with the independent standard reinforcementlearning, the convergence and effectiveness of coordination-based standard reinforcementlearning was analyzed.(3)The design of distributed Nash Q-learning algorithm toward adaptive traffic signalcontrolAccording to the mutual relevance of traffic flow between intersection, mathematicalmodel of interaction for intersection’s traffic signal control agents was built based on nonzero-sum Markov game, and the distributed Nash Q-learning algorithm to solve the modelwas put forward. In the proposed algorithm, each intersection’s traffic signal control agentselects action according to not only its own Q-values but also the Q-values of otherintersections’ traffic signal control agents. The selected action is the Nash equilibrium ofQ-values of all the current intersections’ traffic signal control agents. This method let eachintersection’s traffic signal control agent learn to update its Q-values under the joint actionsand imperfect information. Theoretical analysis and simulation experiment results show thatthe method is convergent. Compare with independent reinforcement learning algorithm, fixedtiming control, and foreign relevant literature’s algorithms, its effectiveness was analyzed.(4)The design of multi-interactive history learning coordinated algorithm towardself-adaptive traffic signal controlIn view of the deficiency of the hypothesis of complete knowledge and single interactionin the present application muti-agent-based learning coordination mechanism forself-adaptive traffic signal control, multi-interaction mathematical model for intersection’straffic signal control agents was built based on game theory, and a multi-interactive historylearning algorithm was constructed by introducing memory factor. In the proposed model andalgorithm, each intersection’s traffic signal control agent plays the coordination game with itsneighbors and update its mixed strategy according to the getting payoff, and it takes allhistory interactive information which comes from neighbouring intersection’s traffic signalcontrol agents into account. The learning rule assigns greater significance to recent than topast payoff information. The convergence of the approach was analyzed theoretically. Howthe parameters such as memory factor, learning probability, the local traffic changeprobability etc. will affect the algorithm’s performance was analyzed.Compare with foreign relevant literature’s method by an experiment of coordinated control for main intersections inthe arterial, the result indicates that this method is effective.(5)The design of policy gradient approach toward self-adaptive traffic signal controlAs the status information of urban traffic system environment is difficult to completelyperceived by control system, the self-adaptive traffic signal control was seen as POMDP(Partially Observable Markov Decision Process) problem, and the POMDP environmentmodel of intersection self-adaptive traffic signal control was established. Based on theintroduced GPOMDP algorithm, the shortage of the general policy gradient estimationapproach, the OLNAC algorithm for self-adaptive traffic signal control was designed by thefusion of natural gradient, value function method. How the related parameters will affect thetow algorithm’s convergence was analyzed by simulation experiment. Compared withSAT(saturation-balancing technique), uniform technique, random technique, and foreignrelevant literature’s method,the proposed algorithms are effective and has a certainapplicability to solve the self-adaptive traffic signal control. |