| Planning under uncertain and dynamic environments is an essential capability for autonomous robots.Partially observable Markov decision processes(POMDPs)provide a rich mathematical framework for solving such problems,and have been applied to different robotic tasks such as self-driving car navigation and manipulation with robot hands.While there is dramatic progress in solving discrete POMDPs,works on continuous POMDPs have been limited.For the shortcomings of current algorithms,this paper presents three novel ones:i.In order to address the inefficient a priori discretization of the continuous-state space as a grid,this paper presents a novel efficient algorithm for continuous POMDPs,named GPG.GPG samples both a robot’s state space and the corresponding belief space.At the same time,GPG deals with the problems in continuous action and observation spaces using a sampled max operator and generalized policy graphs.Preliminary experimental results indicate that GPG is a promising new approach for robot motion planning under uncertainty.ii.In order to address the size of a policy graph in Monte Carlo value iteration grows over time for continuous-state POMDPs,which drastically reduces the performance of the algorithm,this paper presents an Optimized Monte Carlo Value Iteration(OMCVI).OMCVI optimizes the addition of nodes and prunes the dominated or redundant nodes.It constructs more compact policy graphs with comparable qualities.iii.In order to address the inefficiency at heuristic search stage for traditional algorithms for continuous-state and large observation POMDPs,this paper presents a novel approach,called Gingko Leaf Search(GLS).In the forward exploration phase of traditional algorithms,only the outcome that has the highest potential impact is searched.GLS allows the selection of more than one outcome when their potential impacts are close to the highest one.At the same time,it adaptively adjusts the number of the selected outcomes.Compared with the traditional algorithms,GLS can save considerable time to propagate the bound improvement of beliefs in deep levels of the search tree to the root belief because of fewer point-based value backups.Then we give the proof of its convergence.Experiments show that GLS owns faster convergence rate and better performance. |