| The Resource-Constrained Project Scheduling Problem(RCPSP)is a highly representative cumulative scheduling problem,which is widely present in practical production and daily life.In order to make the methods for solving RCPSP more applicable,it usually needs to have good solving efficiency,solving accuracy and generalization ability on unknown instances.However,most current methods tend to focus on one aspect at the expense of the others,making it difficult to strike a balance among the three.In recent years,with the development of artificial intelligence technology,some deep reinforcement learning models for solving disjunctive scheduling problems have emerged.They have the advantage of adaptive learning,enabling real-time decision-making for scheduling states,and have shown high solving efficiency and excellent generalization performance in related research.Therefore,this thesis proposes a deep reinforcement learning model based on attention mechanism for the resource-constrained project scheduling problem.The specific research work and achievements are as follows.To apply reinforcement learning methods to optimize the RCPSP and generate scheduling situation data,this thesis first conducts research on the two schedule generation scheme of RCPSP,the Serial Schedule Generation Scheme(SSGS)and the Parallel Schedule Generation Scheme(PSGS),establishing Markov decision process models for both SSGS and PSGS,designing detailed schedule actions,rewards,and feature representations for describing scheduling states so that obtain the scheduling environment based on reinforcement learning.To enhance the adaptive learning ability of the solution model to scheduling states and improve its generalization performance,this thesis combines the graph structure of RCPSP with the possible long-distance correlation information within the project and proposes two neural networks based on the attention mechanism to enhance the original scheduling state features and capture the complex implicit information within the project.To obtain scheduling actions from scheduling states,this thesis designs a policy network that implements scheduling decisions based on the output of the feature extraction network using a multilayer perceptron.To address the problem of insufficient aggregation of node information in the feature extraction network,this thesis performs global state information concatenation at the input of the policy network.To optimize the deep reinforcement learning model proposed in this thesis for solving RCPSP instances,we use the PPO algorithm to train the model,improve its solving accuracy,and minimize the maximum completion time of the project.Setting up numerical experiments to verify the validity of the RCPSP scheduling environment and the deep reinforcement learning model proposed in this thesis,analyze the solving quality of the proposed method and its generalization performance on large-scale instances,and finally evaluate the solving efficiency of the proposed method.Numerical experiments show that compared with widely used heuristic rules and advanced hyper-heuristic methods,the deep reinforcement learning model proposed in this thesis has good solving accuracy on the validation set and test set after training on small-sized RCPSP instances and can generalize to large-sized RCPSP problems while maintaining high solving efficiency. |