Font Size: a A A

The Research Of Elevator Dynamic Scheduling Policy Based On Reinforcement Learning Algorithm

Posted on:2006-10-10Degree:MasterType:Thesis
Country:ChinaCandidate:G S XingFull Text:PDF
GTID:2132360182976672Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
This thesis mainly researches elevator group scheduling based on reinforcementlearning method.The problem of elevator group scheduling has been researched extensively dueto its high practical significance. Elevator group system is a type of Discrete EventDynamic System (DEDS) operating in continuous time and space. So in order to findout the effective method for elevator group scheduling problem, those for DEDSscheduling are investigated. And on the basis of the overview of DEDS schedulingmethods and their applications, the methods are classified into three types: classicalmethods, intelligent methods and reinforcement learning method.Reinforcement learning derived from AI, as an approximate method of dynamicprogramming, has drawn more attention of the researchers in fields of AI, controltheory and operational research with breakthroughs in its mathematical foundation.This thesis presents the basic theory of reinforcement learning and its history,analyzes its background and the two characteristics of avoiding the computationalintractability resulted form the exhausted sweeps in state space and environmentalmodel not being necessary for value iteration and then the basic model forreinforcement learning methods is established. What's more, the reinforcementlearning methods are divided into different types with its classical algorithmspresented and a frame of general steps for the research of reinforcement learningmethod resolving actual problem is built up, which should direct future work.Reinforcement learning, as a method learning optimal policy from interactionwith the environment, is suitable for large-scale dynamic optimization problems suchas elevator group scheduling. After analyzing the field in detail, the thesis summarizesthree difficulties in this problem resolving: huge state space, the magnificentuncertainty during system running and hard-handled computing as a result ofrescheduling. The problem of elevator group scheduling is formulated by theframework of Markov Decision Process (MDP) and then the elements are definedaccording to the specified field. During reinforcement learning applied, the stochasticaction-selected policy and feedforward neural network are used to handle theproblems of exploration and generalization of value function respectively, which areintegrated into the value iteration algorithm called Q-learning to construct the wholealgorithm for elevator group scheduling.And an open, loose-coupled structure of algorithm software is designed and theinterfaces between different functional components are well defined for easy futurereuse. By using MATLAB as the mainly environment for software running, therapidness of software coding and the efficiency of debugging are improved. Thesimulation experiments are done in the virtual environment for elevator group control.With four different traffic flows used for the simulating and training of algorithm, theexperimental results demonstrate the good learning ability, good performance and theadaptability for different traffic flows of scheduling algorithm in contrast to otherones.
Keywords/Search Tags:Elevator Group Scheduling, Reinforcement Learning, DEDS, Function Approximation and MDP
PDF Full Text Request
Related items