Researches Of Robocup’s Local Strategy Based On Multi-Agent Reinforcement Learning

Posted on:2013-01-24

Degree:Master

Type:Thesis

Country:China

Candidate:J Li

Full Text:PDF

GTID:2248330371994104

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Reinforcement learning has become a central paradigm for solving learning-controlproblems in artificial intelligence. The traditional reinforcement learning suffers from slowconvergence and could not be availably used in some applications, such as uncertainenvironment, multiple agents and multiple goals. And the training of RoboCup has allthese problems. To solve the slow convergence and the multi-goal feature, some improvedalgorithms are proposed in this paper.The main research contents are concluded as follows:ⅰ. The expected cumulative-reward could not be used in all applications, and itsuffers from slow convergence due to the influence of accumulating the lower rewards, andtakes time to fade away the effect of the sub-optimal policy. To solve these problems, thenon-cumulative reward is proposed in this paper, and the reinforcement learning modelwith the non-cumulative reward is also proposed. The algorithm is applied to the shootingtraining of RoboCup. The experimental results show that the proposed algorithm hascertain advantages compared to reinforcement learning methods with the expectedcumulative-reward.ⅱ. R-Learning has some problems, such as slow convergence and sensitivity withparameters. To solve the problem of the slow speed of convergence, an improvedR-Learning algorithm is proposed. The algorithm uses BP as the approximate function togeneralize the state space. The experimental results of Keepaway show that the proposedalgorithm converges faster and has the ability of generalization.ⅲ. To solve the multiple-goal problem of RoboCup, a novel multiple-goalreinforcement learning algorithm, LRGM, is proposed. This algorithm estimates the lostreward of the greatest mass of sub goals and trades off the long term reward of sub-goals toget a composite policy. ⅳ. B error function of the single learning module based on MSBR error function isproposed in this paper. B error function has guaranteed the convergence of the valueprediction with nonlinear function approximation. The probability of selecting actions andthe parameter α are also improved with respect to B error function. The experimentalresults of shooting2vs.2show that the LRGM-Sarsa(λ) is more stable and can convergefaster.

Keywords/Search Tags:

reinforcement learning, multi-goal, non-cumulative reward, RoboCuptraining

PDF Full Text Request

Related items

1	Research And Application Of Deep Reinforcenment Learning Algorithms Based On Reward Shaping
2	Researches On Efficient Exploration Driven By Reward Function
3	Research On Reward Optimization In Reinforcement Learning
4	Research On Clustering Algorithm And Its Application Based On Reinforcement Learning
5	Research On Sample Generation And Selection Methods For Deep Reinforcement Learning
6	Research On Multi-goal-conditioned Method In Reinforcement Learning With Sparse Rewards
7	Research On Environment Adaptive Reinforcement Learning Methods
8	Research On Multi-Agent Reinforcement Learning Under Sparse Reward Scenario
9	Research On Deep Reinforcement Learning Algorithm Based On The Combination Of Intrinsic Reward And Auxiliary Tasks
10	Theory and application of reward shaping in reinforcement learning