| Material handling serves as a pivotal element in the production process,significantly impacting the overall cost of product manufacture.The Automatic Guided Vehicle(AGV),due to its flexible and reliable nature,has emerged as a preferred tool for intelligent transportation in developed nations such as Japan,and countries in Europe and North America.Task scheduling,which is the assignment of tasks to each AGV in an orderly and logical manner,can enhance material handling efficiency and diminish production costs.To address the task scheduling issue in a warehouse setting,we propose a novel AGV task scheduling method utilizing transfer reinforcement learning.The research content of this thesis is as follows:Firstly,we also tackle the issue of multi-agent reinforcement learning and joint action space dimension,a prevalent problem in disaster scenarios.Our proposed solution,the Action Sampling Q-learning(ASQ)algorithm,utilizes a centralized training-decentralized execution framework.During centralized training,the ASQ algorithm updates the Q-values of joint movements without iterating all of the Q-values,instead sampling only a portion.Each agent independently selects its action during the action selection and execution phase.Comparative simulations with other algorithms for tasks such as robot cooperative handling and distributed sensor network tasks demonstrate that the ASQ algorithm,while achieving the same learning outcomes,significantly reduces computational load and consistently converges to the optimal strategy with a100% success rate.Secondly,to overcome the issue of low generalization performance inherent in the reinforcement learning algorithm,we put forth the State Transition Similarity-based Transfer Learning(TLSST)algorithm.Built on the ASQ algorithm,this model determines the cosine similarity between the target task and the source task,by preserving each state and the transition probabilities of state-action pairs experienced in both tasks.During the learning of the target task,the TLSST algorithm transfers the Q-value function from the source task to the target task,the TLSST algorithm guides the learning of the target task in conjunction with the state transition similarity between the two tasks.Simulation results reveal that the TLSST algorithm outperforms comparative algorithms in terms of convergence speed during cooperative robot handling tasks,and consistently converges to the optimal strategy with a 100% success rate.Finally,as a final step,an AGV task scheduling platform was designed and implemented to substantiate the efficacy of the proposed transfer reinforcement learning method in AGV task scheduling.Simulation outcomes attest that the TLSST algorithm outstrips other algorithms regarding swift startup capability and convergence ability,and it exhibits superior generalization performance.To scrutinize the strategy adopted by the TLSST algorithm,we utilized the logistics simulation software Flexsim for visual representation of the task allocation strategy.This approach effectively affirmed the optimality of the TLSST algorithm. |