| The sortie rate of carrier-based aircraft is often used as a core indicator to measure the combat performance of an aircraft carrier,and the scheduling efficiency of aviation support operations(hereinafter referred to simply as support operations)is one of the key factors affecting this indicator.Carrier-based aircraft support operations scheduling is a real-time scheduling planning problem that optimizes the carrier-based aircraft support operations process with the goal of improving the dispatch efficiency of carrier-based aircraft under the joint constraints of limited space,time and support resources.It is crucial to the formation of the combat effectiveness of aircraft carrier formations,and has always been a long-term focus of military powers in the world.In recent years,reinforcement learning has developed rapidly as a subfield of machine learning,and has achieved many fruitful results in the field of scheduling optimization.It is very suitable for dealing with sequence decision-making problems such as support operations scheduling,and can realize efficient scheduling scheme planning in dynamic,uncertain and real-time environments.Based on this,this thesis conducts a series of exploratory studies on the Aircraft Support Operations Scheduling(ASOS)problem.First,we propose a sequential decision-making algorithm based on reinforcement learning and the algorithm consists of two modules:policy learning and online decision-making.It optimizes the scheduling process from a global perspective by maximizing long-term cumulative rewards.However,since the algorithm adopts the "First Come First Serve" model,the decision-making is short-sighted.It only considers the current candidate matching,and may miss the subsequent better matching.Therefore,we propose an adaptive batching strategy based on reinforcement learning to optimize the scheduling process from the perspective of batch processing.This strategy can adaptively divide an appropriate batch size according to the real-time scheduling environment,and then allocate batches.Finally,benefit from the research on the ASOS problem in military scenarios,this thesis transfers the theoretical results of reinforcement learning-based research to the civil urban logistics scheduling problem.The main contributions of this thesis can be summarized as follows:(1)The ASOS problem is formally defined,and an efficient sequence scheduling algorithm based on reinforcement learning for ASOS problem is proposed.The algorithm not only considers the immediate reward generated in the scheduling process,but also takes into account the long-term benefits,so as to optimize the long-term utility of the scheduling process.Specifically,it firstly models the real-time matching between carrier-based aircraft support operations and support positions as a POMDP-based sequential decision-making problem,and then adopts a DQN-based learning planning framework to solve it.In order to verify the performance of the algorithm,a simulation environment is constructed,and the algorithm is experimentally verified on the simulation data set.The results show that the algorithm can effectively meet the needs of the real-time scheduling scenario of carrier-based aircraft.(2)Limited by the "short-sighted" defect of the sequence decision-making "First Come First Serve" model itself,the bottleneck brought by the scheduling performance of support operations,we propose an adaptive sliding window decision algorithm based on reinforcement learning to solve the ASOS problem.The algorithm can adaptively divide the sliding windows(batches)according to the real-time scheduling environment,and then performs batch allocation(i.e.,match the carrier-based aircraft and support positions in each sliding window).At the same time,we design a novel state representation for the algorithm,which integrates several key factors such as quantity,time and movement cost in the scheduling process of carrier-based aircraft support operations to further improve the performance of sliding window partitioning.Finally,extensive experiments are carried out to verify the performance of the algorithm,and the results show that the algorithm can achieve high-quality support operations scheduling under the premise of meeting real-time requirements,and has better performance than the algorithm proposed in(1).(3)The experience and thinking of exploring the scheduling problem of support operations in military scenarios are transferred to civilian scenarios,and a Real-Time City Express Delivery via Adaptive Sliding Window(RTDW)problem is formally defined.Compared with the ASOS problem,the RTDW problem has a larger number of tasks and a more flexible space route,so the above approaches cannot be directly applied.Therefore,we propose a Sequential Matching Algorithm(SMA)and a Time-aware Batch Matching(TBM)algorithm to solve it.In addition,inspired by the idea in(2),we present a DRL-based algorithm to optimize the TBM algorithm.The DRL-based algorithm is implemented based on deep reinforcement learning and equips with a novel combined feature vector as the perceptual state to adaptively determine sliding window size,thus bringing good long-term benefits to the platform.Then,we theoretically analyze the competition ratio of the batch algorithm to ensure the actual performance of the algorithm proposed in this thesis.Finally,extensive experiments on two real datasets show that the proposed algorithm can achieve ideal matching quality and efficiency under different parameter settings. |