| Simultaneous Localization and Mapping(SLAM)is one of the most fundamental and core application technologies in mobile robot technology.Traditional active SLAM framework is prone to local optimality,slow state prediction,and poor generalization.This paper proposes an active SLAM algorithm based on meta-reinforcement learning to realize autonomous navigation,localization,and map construction of robots in unknown complex environments.A research focus is how to improve the robot’s motion decision-making ability in an unknown environment to complete Localization and environment map construction without collision.The specific research content is as follows:Firstly,an active SLAM algorithm based on MQL(Meta Q Learning)was proposed to solve the common problems of overestimating Q value,sub-optimal strategy,and poor generalization in traditional reinforcement learning algorithms.The neural network decision module is designed,and the past trajectory is designed as a context-based hidden variable combined with meta-learning to adapt to new tasks quickly.The robot motion model was analyzed,and three elements of reinforcement learning were designed.The robot motion commands were further generated using the environmental data sensed by Lidar,and the environment map was constructed using the Gmapping algorithm.Simulation results show that the proposed algorithm can avoid static and dynamic obstacles in real-time without prior maps and complete the task of creating environmental maps.Secondly,for Sparse Reward problems caused by dynamic changes and sensor drift in complex unknown environments,an active SLAM algorithm based on SR-MQL(Sparse Reward-MQL)was proposed.A reward function based on potential energy is introduced,target points are randomly set,and the robot is guided to traverse the environment through reward remodeling in the meta-training stage,which improves the robot’s exploration ability.At the same time,to solve the sparse reward problem caused by sensor drift,the difference between the real environment and simulation environment,and missing reward value,the learning advantage function is introduced in the adaptive stage to identify dynamic changes to internalize rewards,and then evaluate the current strategy and calculate the gradient,which improves the decision-making ability of the algorithm.Simulation and experimental results show that the convergence rate of the improved algorithm is improved,the success rate of the robot is higher,and the robot movement is more stable and efficient.Finally,PER-MQL(Prioritized Experience Replay-MQL)is prioritized to address the problem of low utilization of sampled data and extended training time in the reinforcement learning Experience Replay pool.The sampling weights were designed,the bias estimation was used to maximize the performance of new tasks in meta-learning,and a lower weight was assigned to the experience of inaccurate Q-value assessment to make the strategy closer to the optimal action decision and further improve the data utilization.At the same time,combined with transfer learning,the trained strategy parameters are transferred to the new environment as the initial parameters,which significantly improves the generalization.The simulation and experimental results show that the proposed algorithm has improved the reward value,the number of planned steps,and the single-step reward value in a static and dynamic environment,and its convergence speed is faster and the planning time is shorter,which is suitable for the complex new environment. |