Font Size: a A A

Research On Planning Based On Markov Decision Theory

Posted on:2009-06-01Degree:DoctorType:Dissertation
Country:ChinaCandidate:C J FanFull Text:PDF
GTID:1100360272962513Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years,agent and multi-agent planning have been new research hotspots in the field of Artificial Intelligence with a broad prospect.In this paper,the research is on the basis of Markov decision process and its related theories,study the issues and solutions of these theories when contact with the real-world application,present some theoretical analysis and improvement on a class of basic Markov decision-making algorithm.The research in this paper is mainly related to the following content:(1) A systematic study of the foundation models and algorithms related to the agent and multi-agent planning problem with uncertainties.Among the models,first of all,is the Markov decision processes model as the most basic one,and then is the partially observable Markov decision processes model which take into account of the observation uncertainty,and further is the decentralized partially observable Markov decision processes mode with multi-agent cooperation joined and the partially observable stochastic games model with multi-agent competition involved.Among the algorithms,a survey and some comparative analysis are carried out following such a classification that,one is a backward iteration fashion and the other is a forward search fashion.At last,further discussion is made about the Semi-Markov decision processes model and the Option theory involving temporal abstraction,which is a foundation for the design of hierarchical planning framework and its algorithms.(2) Robocup simulation 2D is regarded as a standard platform for research on multi-agent planning problem in large scale uncertain environment.With some necessary explanation about the platform itself,address and analyze the design methods on some sub-problem for the overall planning in such kinds of real world application.By employing existing Markov decision theory to model these issues, give the platform more general research significance.(3) Option theory involves the concept of temporal abstraction and can be applied for hierarchical planning,which gives a new research direction for Markov decision theory being capable to contract more with the real world application. Toward the large scale multi-agent planning problem with the observation uncertainty like Robocup simulation 2D,on the basis of the partially observable stochastic games model,by combining the technique of policy heuristic,belief state compression, factored representation and Option theory,give a new decision-making framework with a concept of behavior generator,and design a new real-time heuristic search algorithm consequently,which is aimed to quickly find a feasible solution online. Finally,show the practical effect of testing these new methods on Robocup simulation 2D as a standard platform.(4) On the basis of the Option theory,when applying hierarchical planning toward the large-scale problem,the capability of solving sub-problem efficiency is also crucial.Real-Time Dynamic Programming combines the technology of heuristic search and value iteration to solve the Markov decision problems.The core issues of the algorithm design are the branch strategy and the convergence criterion.The branch strategy influences the convergence speed of value iteration;the convergence criterion is applied to determine the optimal solution.Several typical convergence criterions are compared and analyzed,afterward,a new one named the optimal action criterion and a corresponding branch strategy are proposed on the basis of the upper and lower bound theory.This criterion guarantees that the agent can act earlier in a real-time decision process while an optimal policy with sufficient precision still remains.It can be proven that,under certain conditions,one can obtain an optimal policy with arbitrary precision using such an incremental method.With these new techniques,a Bounded Incremental Real-Time Dynamic Programming algorithm is designed.In the experiments of two typical real-time simulation systems,BI-RTDP outperforms the other state-of-the-art RTDP algorithms.Finally,through the study on the asynchronous value iteration mechanism of RTDP,improve its capability of dealing with cycle in the search graph,and show the off-line experimental results.
Keywords/Search Tags:multi-agent, MDP, POSG, SMDP, Option, Robocup, RTDP
PDF Full Text Request
Related items