| The batch machine is a type of equipment that can simultaneously process multiple jobs under some constraints.It has been widely used in manufacturing industries,such as the metal processing,semiconductor production,textile dyeing and finishing operation,and so on.In recent years,the stochastic batch scheduling problem has attracted the many scholars attention with the in-depth study.In this dissertation,we study the batch machine scheduling problem under the random arrival of jobs with non-identical sizes.Firstly,we formulate the mathematical model of this problem based on the system physical model and working mechanism.More specifically,this problem is modeled as a semi-Markov decision process and a continuous-time Markov decision process respectively according to whether the processing time is fixed or random.A policy iteration algorithm based on the mathematical model is used to obtain the optimal scheduling strategy with the aim of minimizing the production cost of the system.Secondly,considering that the practical production system is difficult to be modeled and the solution time of the policy iteration algorithm is too long,we introduce Q learning algorithm in reinforcement learning to solve this problem.Aiming at the problem of too large action space in Q learning during the solution process,an action set reduction strategy is proposed.Experimental results show that the improved Q learning algorithm is better than the original algorithm.And then,a scheduling method based on rule learning is proposed for the larger-scale problems.This method uses heuristic rules to schedule jobs at the lower layer and applies Q learning to select the appropriate heuristic rules for each system state at the upper layer.Therefore,2 types including 9 heuristic rules are designed to form a system rule base for Q learning to search according to the characteristics of the system.Simulation experiments show that this method has better performance in optimization ability and computational efficiency than traditional Q-learning,and the method efficiency increase with the problem scale.Finally,the DQN algorithm in deep reinforcement learning is introduced to solve the problem.Exploring the feasibility of using an artificial neural network instead of the manual analysis to solve the situation that the rule learning scheduling method in practical production may design unreasonable heuristic rules.Experimental results show that the DQN algorithm can carry out autonomous learning effectively and achieve good results. |