| The automated yard,serving as the starting and stopping point for container handling and storage within a container terminal,has a significant impact on the operational efficiency of the entire terminal.An efficient automated yard scheduling strategy can reduce container handling time and equipment energy consumption by optimizing the operating sequence of automated stacking cranes(ASCs),consequently enhancing terminal service quality.For solving the automated yard scheduling problem,characterized as an NP-hard combinatorial optimization problem,traditional methods such as mathematical planning,meta-heuristics,and rule-based heuristics encounter difficulties in balancing solution speed and quality.In recent years,as an artificial intelligence approach aligning more closely with human thinking processes,Deep reinforcement learning(DRL)have been demonstrated substantial potential in solving combinatorial optimization problems.Comparing to general standard scheduling problems,such as the flow shop scheduling problem and the job shop scheduling problem,the automated yard scheduling problem involves operational processes and constraints that are more intricate to handle.Existing DRL based scheduling methods encounter difficulties in addressing this problem due to the absence of a simulation environment capable of effectively replicating the container handling process and a network structure that can adequately represent the system’s states.Therefore,this thesis focuses on DRL-based scheduling methods for automated yards.Two DRL approaches are proposed for the static scheduling problem and the dynamic scheduling problem,respectively.The specific research tasks are as follows:The automated yard scheduling operation rules,optimization objectives,and constraints necessary for constructing the DRL environment are thoroughly examined.Subsequently,mixed-integer programming(MIP)models for the static and dynamic scheduling problems in automated yards are established,respectively.The yard scheduling problem revolves around the cooperation of ASCs to handle a specific batch of containers.The static scheduling problem places its emphasis on ASC equipment conflicts in the yard’s handshake area and the methods to avoid these conflicts.The primary objective is to minimize the makespan,with the condition that the current bay and the destination bay of each container are known.In contrast,the dynamic scheduling problem deals with import containers arriving dynamically,with uncertain arrival time but a known arrival order,and containers can only be handled after they have entered the seaside IO.Focusing on the interaction mode and congestion phenomenons between automated guided vehicles(AGVs)and the yard,a comprehensive optimization objective based on ASC working time and AGV waiting time is adopted.The instance and interaction environment needed for the training and testing of the DRL agent are investigated,and a new event-driven simulation environment for static and dynamic scheduling of automated yards is created.To automatically generate sufficient scheduling instances with varying sizes and distributions necessary for DRL training,scheduling instance generation methods for static scheduling and dynamic scheduling are proposed,and the parameters are configured.Based on the constraints of MIP models and the operation rules of yard scheduling,the updating methods of the states of static scheduling and dynamic scheduling simulation environments are proposed,and the state spaces,including the container position and the operation states of ASC,are established.Leveraging these methods,the yard scheduling simulation environment required by the DRL algorithm is constructed.The challenge of extracting location-related features between containers and ASCs for static scheduling problem in automated yard is explored.To address this issue,we introduce a feature extraction network based on the self-attention mechanism.In order to reduce the sequence-dependent setup time before handling and the waiting time caused by avoiding equipment conflict,this thesis innovatively represents the preparation time and the waiting time as the location-related relationship between containers and ASCs,and designs a feature extraction network based on the selfattention mechanism to realize the efficient extraction of the location-related features of containers and ASCs.The issue of location-related feature extraction between containers and ASCs for dynamically scheduling problem in automated yard is explored.To address this challenge,we introduce a dynamic masked attention(DMA)mechanism.In the dynamic scheduling problem,import containers can be handled only after entering the seaside IO,which introduces lots of interfering information from ineligible containers when extracting location-related features.To tackle this issue,a DMA mechanism is proposed,and a DMA feature extraction network is designed accordingly.Leveraging container eligibility information obtained from the simulation environment,the attention weights of the ineligible containers are dynamically set to zero by the masking method,and then the interference of invalid information is removed when calculating the attention output,thereby enhancing the agent’s extraction capability for locationrelated features.The challenge of DRL agent fusing feature information with different semantics and dimensions for dynamic scheduling in automated yard is explored.In response to this challenge,we propose a local information complementary attention(LICA)framework.When optimizing the integrated objective of the dynamic scheduling problem in automated yards,it is essential to consider both location-related information and congestion-related information.Since the semantics and dimensions of these two types of information are different,the LICA framework is designed.Specifically,the location-related information extracted through a DMA mechanism is divided into location-related information for import and export containers.Subsequently,through pointwise convolution,the location-related features of import containers are fused with congestion-related features extracted using an LSTM network.Finally,this fused information is concatenated with the location-related information of import containers and undergoes a fusion process via the attention mechanism.This framework empowers the agent to effectively balance the working time of the ASCs with the waiting time of the AGVs,leading to improved optimization of the integrated objective function.Experimentally,the DRL method proposed in this thesis for static and dynamic scheduling problems in automated yards demonstrates favorable performance in solution speed and quality.Its superiority over traditional scheduling methods becomes increasingly remarkable as the problem size grows.Furthermore,the DRL scheduling methods exhibit robust generalization ability in unseen scenarios with various scales or distributions,which is promising for practical applications. |