Font Size: a A A

Researches Of Robocup’s Local Strategy Based On Multi-Agent Reinforcement Learning

Posted on:2013-01-24Degree:MasterType:Thesis
Country:ChinaCandidate:J LiFull Text:PDF
GTID:2248330371994104Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Reinforcement learning has become a central paradigm for solving learning-controlproblems in artificial intelligence. The traditional reinforcement learning suffers from slowconvergence and could not be availably used in some applications, such as uncertainenvironment, multiple agents and multiple goals. And the training of RoboCup has allthese problems. To solve the slow convergence and the multi-goal feature, some improvedalgorithms are proposed in this paper.The main research contents are concluded as follows:ⅰ. The expected cumulative-reward could not be used in all applications, and itsuffers from slow convergence due to the influence of accumulating the lower rewards, andtakes time to fade away the effect of the sub-optimal policy. To solve these problems, thenon-cumulative reward is proposed in this paper, and the reinforcement learning modelwith the non-cumulative reward is also proposed. The algorithm is applied to the shootingtraining of RoboCup. The experimental results show that the proposed algorithm hascertain advantages compared to reinforcement learning methods with the expectedcumulative-reward.ⅱ. R-Learning has some problems, such as slow convergence and sensitivity withparameters. To solve the problem of the slow speed of convergence, an improvedR-Learning algorithm is proposed. The algorithm uses BP as the approximate function togeneralize the state space. The experimental results of Keepaway show that the proposedalgorithm converges faster and has the ability of generalization.ⅲ. To solve the multiple-goal problem of RoboCup, a novel multiple-goalreinforcement learning algorithm, LRGM, is proposed. This algorithm estimates the lostreward of the greatest mass of sub goals and trades off the long term reward of sub-goals toget a composite policy. ⅳ. B error function of the single learning module based on MSBR error function isproposed in this paper. B error function has guaranteed the convergence of the valueprediction with nonlinear function approximation. The probability of selecting actions andthe parameter α are also improved with respect to B error function. The experimentalresults of shooting2vs.2show that the LRGM-Sarsa(λ) is more stable and can convergefaster.
Keywords/Search Tags:reinforcement learning, multi-goal, non-cumulative reward, RoboCuptraining
PDF Full Text Request
Related items