| Traditional path planning methods rely on the priori knowledge of environment model,which can’t be widely used in unknown environments and complex tasks.In recent years,deep reinforcement learning has been applied to solve the problem of motion planning in the highdimensional environment of robots,and substantial breakthroughs have been made in such aspects as autopilot,mobile robot navigation,and robotic arm trajectory planning.In order to enhance the generalization of path planning methods,this paper applies deep reinforcement learning to path planning tasks based on adaptive maximum entropy adjustment,which enables an agent to autonomously plan optimal paths for different tasks.The main contributions of this paper can be summarized as follows:The reward function greatly determines the convergence rate in deep reinforcement learning.In order to avoid the problem of reward sparsity in deep reinforcement learning,a combined reward system applicable to solving path planning problems is proposed.First,a goal-guided term,a penalty term and an additional reward are considered for the reward system,and then each reward term is combined with different proportion into one as whole reward system.To conduct the comparison experiments,three types of experimental scenarios are designed,in which the complexity and the difficulty are progressively increased.A generic combinatorial reward system for path planning has been found,which effectively solves the problem of policy non-convergence due to sparse rewards in reinforcement learning.Finally,the generality of the proposed combined reward system is verified in several experiments with complex scenarios.For the difficulty of keeping balance between exploration and exploitation in deep reinforcement learning,a deep reinforcement learning algorithm with adaptive maximum entropy adjustment is proposed.The proposed method achieves automatic adjustment of temperature parameters so that the entropy can vary among different states to control the degree of exploration,which reduces the possibility of learning suboptimal strategies.The proposed method effectively enhanced balance of exploration and exploitation in deep reinforcement learning.The effectiveness and superiority of the proposed deep reinforcement learning algorithm with adaptive maximum entropy adjustment are verified in many experiments. |