| Robot decision-making problem is an important problem in the field of artificial intelligence.Its difficulty is mainly reflected in the large scale of state space.In order to obtain agents who can make good decisions in various states,the method needs a lot of computer storage and operation.At present,a large number of deep reinforcement learning methods are widely used in decision control.The performance of these methods will decrease with the expansion of state space.Therefore,it is of great theoretical and practical significance to find a method based on reinforcement learning to effectively reduce the state space of the method,improve the generalization performance of the method,and solve more complex robot decision-making and control problems.Compared with reinforcement learning,the planning method performs better in a longer time step.Therefore,by integrating the respective advantages of reinforcement learning and planning,this paper designs a method integrating reinforcement learning,planning method,parameter optimization method and evolutionary algorithm.At the same time,this method constructs an agent that can learn both at the reinforcement learning level and at the planning level.Specifically,an agent combining planning and reinforcement learning distributes path points in the navigation scene to assist decisionmaking.Through auxiliary path points,agents can reach long-distance targets that are difficult to reach.An agent with this ability performs better in completing navigation tasks than an agent who can only make decisions through reinforcement learning.In this paper,an agent experiment is carried out in a simple scene.The experimental results show that the proposed method can greatly reduce the scale of the problem and make it easier to solve the problems that are difficult to solve only through reinforcement learning.Based on the combination of planning and reinforcement learning,this paper continues to study the hyperparametric optimization problem in the combined system.Among them,the number of path points and the distance between path points have an important impact on the navigation performance of the system.Usually,the super parameters of such systems need to be set manually,and a large number of experiments are carried out to determine the better value.Although some existing automatic search methods can find the optimal value in theory,they have the problem of huge cost.To solve this problem,this paper proposes a super parameter search method,which can automatically search the optimal value of super parameters and has the advantage of fast search speed.After designing a fast hyperparametric search method,this paper further studies the optimization of the nonparametric part of the model.In this system,a nonparametric part is the structure of path points.Because there are obstacles in the navigation environment,the path points should be placed more densely where there are obstacles,while other parts,such as the edge of the environment,can be placed sparsely.In order to meet the above requirements,this paper adopts a path point dynamic maintenance method similar to the evolutionary method to realize the automatic placement of path points.Experimental results show that this method can achieve this goal and effectively avoid the random dispersion of path points. |