| Since the new era,with the rapid development of economy and technology,people have a wide variety of travel options.Rail transit,as a safe,efficient and economical mode of transportation,has become the first choice for more and more people.As a kind of rail transit,urban rail transit has played a huge role in ensuring the normal operation of urban transportation.With the increase of line operation density has become the norm,designing effective train operation control methods to ensure the safety,efficiency and energy saving of train operation is of great significance and value to the development and smooth operation of urban rail transit.This paper mainly focuses on the online optimization of urban rail transit train operation control strategy.Optimize train operation control strategies online with deep reinforcement learning algorithms and identify train motion model parameters to improve calculation accuracy.The main research contents are as follows:(1)Aiming at the problem of online optimization strategy of train operation control under complex conditions,a train operation control strategy based on deep reinforcement learning is designed.Combining the Deep Deterministic Policy Gradient algorithm and Experiential Knowledge,the Deep Deterministic Policy Gradient and Experiential Knowledge(DDPGK)algorithm is designed,and the strategy function network structure and value function network structure are given.The experiential knowledge of train operation control and two strategic reasoning mechanisms are established under the constraints of the limited speed,maximum acceleration,train running time and starting position of the train running route.Based on the characteristics and performance indicators of train operation control,the status,action and reward functions of the train operation control environment are designed,and the DDPGK algorithm flow is given.When the running speed and acceleration are limited,the simulation experiments of train online optimization operation control under different planned running times,speed limits of different route intervals and temporary adjustment of arrival time are designed.These experiments verify the effectiveness and adaptability of the DDPGK algorithm.(2)Aiming at the problem of uncertain train kinematic parameters and the slow convergence of the DDPGK algorithm with large reward oscillations,a train operation control method based on train motion model parameter identification and deep reinforcement learning is designed.The Improved Cuckoo Search algorithm is used to identify the parameters of the train motion model,update the basic resistance parameters in the train motion model,and improve the accuracy of train control during line operation.Using the Twin Delayed Deep Deterministic Policy Gradient algorithm,which is an improvement on the Deep Deterministic Policy Gradient algorithm,combines with Experiential Knowledge,the Twin Delayed Deep Deterministic Policy Gradient and Experiential Knowledge(TD3K)is designed.The corresponding strategy function network structure and value function network structure are constructed,and the algorithm flow is given,which improves the problem of slow convergence and large reward oscillations in the DDPGK algorithm training process.The designed simulation experiments verify the necessity of basic resistance parameter identification for the study of train operation control strategies and traction energy consumption.The advantages of the TD3 K algorithm based on model parameter identification for the optimization of train traction energy consumption are demonstrated. |