| In recent years,with the rapid growth of China’s economy,urban rail transit has entered a period of fast development and construction,and has become a major energy consumer in the city.Many experts and scholars study the Automatic Train Operation System of urban rail transit and put forward different optimization methods to reduce the traction energy consumption caused by train operation.The research of optimization method is mainly focused on the target recommended speed curve and tracking speed curve of optimizing train energy-saving operation,which is essentially off-line optimization.In the actual subway operation,the running environment and motion model of the train are not fixed,which leads to big deviation between the effect of simulation tracking and the actual operation.The off-line optimization calculation and the real-time train control are separated.In the actual operation,if the disturbance occurs,the train arrival time needs to be adjusted dynamically,and the off-line optimization algorithm can’t adapt to this situation.At present,there is little research on the train on-line optimal control algorithm.Therefore,in order to solve the problem that it is difficult to overcome in off-line optimization,an on-line optimization algorithm for automatic train operation is proposed in this paper.The core content of this paper is to propose an online optimization algorithm for automatic train operation based on deep reinforcement learning.First of all,this paper makes an in-depth study on the development and evolution of the traditional reinforcement learning to deep reinforcement learning algorithms,merit and demerit of different algorithms,as well as their respective applicable problem models.For this paper,the reinforcement learning is selected to optimize the automatic train operation online and consolidate the theoretical foundation.The online algorithm designed in this paper is different from the traditional ATO double-layer structure control.It does not need to recommend the target speed curve as external supervision information,nor does it need the accurate model information of the train running environment.Instead,the train controller is constructed as a neural network,and the multi-objective reward value function about energy consumption,comfort,parking accuracy and the safety protection adjustment of the controller output action is designed.In the process of train operation,through the real-time collection of train speed,position and other state information and line speed limit,slope and other environmental information,the train operation energy consumption and arrival parking accuracy are optimized on-line.By building online optimization simulation environment based on Open AI Gym,the effectiveness of the online optimization algorithm designed in this paper is verified under different line conditions.In order to further evaluate the train performance of the simulation results,this paper also makes a simulation experiment of the off-line optimization algorithm of automatic train operation based on ant colony.Compared with the two simulation results,the data shows that the train energy consumption of on-line optimization is obviously better than that of off-line optimization,and the indexes such as punctuality rate and parking accuracy can meet the requirements.In order to verify the effectiveness of real-time online optimization,this paper simulates the dynamic adjustment of train arrival time in the process of train operation.The simulation results show that the on-line algorithm can timely respond to the disturbance in the actual operation of the train,and can still output the operation strategy which meets the operation requirements of traffic safety and punctuality after adjustment,and still meets the parking accuracy and other indicators when arriving at the station.There are 43 figures,4 tables and 45 references in the paper. |