Study On Non-permutation Flow Shop Scheduling Problem Based On Deep Temporal Difference Reinforcement Learning Network

Posted on:2020-07-28

Degree:Master

Type:Thesis

Country:China

Candidate:P F Xiao

Full Text:PDF

GTID:2370330599959262

Subject:Mechanical engineering

Abstract/Summary:

PDF Full Text Request

Flow line is a widely adopted production mode.When there are more than 3 machines,the flow shop scheduling problem is a complex NP-hard problem.Research on the problem is of great theoretical and engineering value.Traditional approaches to the scheduling problem such as mathematical modeling,heuristic and meta-heuristic algorithm,are hard to handle with dynamic change of conditions of resources and tasks,though they can obtain solutions close to the optimal in a short time.Deep reinforcement learning is able to take actions responding to dynamic states directly,it is more appropriate for the manufacturingstate-responsive production process.Therefore,a deep reinforcement learning algorithm is proposed for the first time to apply to the Non-permutation Flow-Shop Scheduling(NPFS)problem.Firstly,the basic theories including Deep Learning(DL)based on neural network and Reinforcement Learning(RL)based on Markov Decision Process(MDP)are introduced.The framework of Deep Temporal Difference Network(DTDN)reinforcement learning algorithm is established.Secondly,the NPFS problem is described.15 manufacturing state features are numerically defined and candidate action set consists of 28 constructive heuristic methods and dispatching rules is represented.The reward function is defined according to the objective of minimizing the Makespan.So NPFS problems is transformed to MDPs.The proposed approach is applied to solve flcmax benchmark problems,comparing with Simple Constructive Heuristic(SCH)and Ant Colony System(ACS)methods.The algorithm is able to obtain solutions lower than the upper bound of original problems in shorter iteration times.Its solution quality is obviously better than that of compared methods and its performance validates the effectiveness of the DTDN algorithm.Thirdly,given the Multi-objective Optimization Problem(MOP)model and the MultiObjective RL(MORL)is described,including its basic architecture and solutions.Then the synthetic objective of minimizing Makespan and energy consumption is established to test the multi-objective DTDN algorithm on Taillard benchmark problems with varying parameter multiple-policy method.It shows that the approach can obtained good pareto solutions.An improvement advice is proposed according to the comparative analysis of the experimental outcomes given different learning rate parameters.At last,dynamic scheduling problem and its commonly used performance indices and rescheduling policy are described.The NPFS problem with dynamic order arrivals using Car instances is devised.The experiment results further testify the dynamic adaptive ability of the DTDN algorithm adopting event-driven policy.The concluding part summarizes the primary research production and puts forward further researching prospects.

Keywords/Search Tags:

Non-permutation flow shop, Scheduling algorithm, Deep learning, Temporal difference method, Reinforcement learning

PDF Full Text Request

Related items

1	The Research On The Two Stage Non-permutation Assembly Flow Shop Scheduling
2	Design And Analysis Of Algorithms On Some Shop Scheduling Problem
3	Research On Monitoring Method Of Multi-scale Cyclone Based On Deep Reinforcement Learning Algorithm
4	Research On Intelligent Decision Model Based On Deep Reinforcement Learning
5	SgRNA Activity Prediction Method Based On Reinforcement Learning
6	Scheduling Models With Jobs' Processing Times Being Nonconstant
7	Research Of Ocean Observation Path Planning Based On Reinforcement Learning
8	Research On AUV Motion Planning Method Based On Maximum Entropy Deep Reinforcement Learning
9	Research And Application Of Imperfect Game Strategy Based On UCT Algorithm And Deep Reinforcement Learning
10	A Study Of The Two-stage Flexible Flow Shop Scheduling Problems