| In recent years,with the rapid development of economic globalization and shipping industry,wharves have become very important transit stations for large-scale cargo logistics.The coal terminal is responsible for the import and export of coal.The main process of the export terminal is that trains carry inland coal into the port,store it in the storage yard,and then ship it away by large ships.With the increasing throughput of the terminal,the scheduling optimization of the terminal has been put on the agenda.The traditional coal terminal or other similar terminal scheduling is mostly done by human,which costs lots of manpower and material resources,and is inefficient,often results in exist of idle equipment or cargo congestion.In this paper,the specific modeling analysis and simulation experiments of Shenhua Tianjin coal terminal are carried out,and the intelligent scheduling of coal terminal is studied based on deep reinforcement learning.First,the specific principles of deep reinforcement learning are elaborated.Reinforcement learning’s model is based on Markov decision process.Its theory is based on Bellman equation and value function.It can solve many decision-making or optimization problems.Its basic algorithms include dynamic programming,Monte Carlo method and temporal difference method.As a very popular supervised learning method in recent years,deep learning can effectively extract the hidden features of the data and fit the function with high approximation.By combining reinforcement learning with deep learning and using neural network to fit value function or strategy,we have deep reinforcement learning.Among its algorithms,the most commonly used DQN algorithm has made amazing achievements in many problems and has been widely concerned and studied in recent years.Then,according to the working conditions of the specific terminal,this paper puts forward a simulation model based on deep reinforcement learning,which takes the belt occupation condition,the remaining time of the tasks and the task to be programmed as the state,the route selection as the action,and the waiting time as the punishment.Using this model and two deep reinforcement learning algorithms,DQN and state value method,the preliminary simulation of the problem is carried out,and the ideal scheduling results are obtained.Finally,the specific data are substituted into the model,including the arrival time of the actual tasks,the completion time of the tasks,the actual route selection of each task,the train types,the type of coal and the stacking position.Based on the simple simulation,the multi-train waiting situation is considered,and the scheduling optimization result under the actual data is obtained.The advantages of scheduling algorithm are summarized by comparing the result with the actual data. |