| In recent years,the growing demand for personalization has put forward higher requirements for the flexibility of production lines in manufacturing enterprises.How to allocate limited machine resources to multiple processing tasks within a reasonable time in a flexible manufacturing environment and obtain a scheduling solution that optimizes production efficiency,production energy consumption and other objectives,i.e.,the Flexible Job Shop Scheduling Problem(FJSP).It is important to explore the FJSP optimization method to provide a high-quality scheduling solution for flexible manufacturing processes and to promote flexible upgrading of manufacturing enterprises.At present,the traditional methods used to solve the FJSP mainly include scheduling rules and meta-heuristic algorithms,with the former having a strong dependency on the production environment and the latter still not meeting the realtime scheduling needs of the actual production environment in terms of computational efficiency.As a learning optimization method,deep reinforcement learning has been applied to solve scheduling problems in many fields and has shown good performance.Therefore,in this paper,the solution methods based on deep reinforcement learning for FJSP are studied.The main research contents are as follows:1.A Markov decision model for solving the FJSP is developed.The solution process of the FJSP is transformed into an oriented process of disjunctive arcs,and the corresponding states,actions and rewards are designed for single and multiple agent deep reinforcement learning methods respectively,providing model support for the subsequent design of FJSP solving methods,where the states are the disjunctive graphs,the actions are the operations of modifying arcs,and the design of the reward function is related to the optimization objective.2.A single-agent deep reinforcement learning method for solving FJSP is studied and proposed.A hierarchical decision-making model combining a neural network model and scheduling rules to make decisions.The graph neural network calculates a high-dimensional embedding of the disjunctive graph and process probabilities for process ranking.Scheduling rules are used for machine selection.Asynchronous Advantage Actor_critic algorithm is used to optimize the model parameters and shorten the training time.In turn,experiments were conducted on standard test examples and the results verify the feasibility and efficiency of the proposed method.3.The FJSP deep reinforcement learning solving method based on grouping multi-agents is studied and proposed.Each machine is set to an agent,and different agents are sorted in the decision-making process separately.Another total agent is set up to achieve the collaboration among the machine agents through the communication with each machine agent and the computer machine priority.Meanwhile,the machine grouping strategy is adopted,and the agents corresponding to the same group of machines share a decision model to reduce the number of models.The model parameters are optimized using a Proximal Policy Optimization algorithm that can be trained offline.Further,experiments were conducted on test sets generated from standard cases to verify the advantages of the proposed method in terms of computing results and computing time when solving FJSP compared to traditional solving methods.4.The proposed single agent and multi-agent deep reinforcement learning methods are extended to multi-objective FJSP based on weighted returns,using weighted methods to calculate reward values and optimize the model.The good scalability of the proposed method was verified through experiments on a test set generated from standard use cases. |