| When the tube climbing robot or wall climbing robot with complex structure works in complex environment,it often needs complex control strategy to achieve the work task,which easily leads to too complex control strategy and poor control effect.Deep learning algorithms facilitate complex control processes,Because deep reinforcement learning algorithms can learn complex control strategies by training in the way of environmental interaction,which has becaming a research hotspot in recent years.In this paper,a control strategy based on DDPG deep reinforcement learning algorithm was proposed for the improved MCR-Ⅰ robot.The control strategy of MCR-Ⅰ robot pipe-climbing based on DDPG algorithm was studied.For DDPG algorithm of slow convergence speed and low utilization rate of sample data,a layered composite reward mechanism was put forward.the learning curve theory was introduced to improve the reward mechanism,dynamic range of the target area,make it at the beginning of the training for more effective experience,to speed up the convergence speed,eventually made its MCR-Ⅰ robot climbing tube accommodation.The navigation control strategy of MCR-Ⅰ robot climbing wall based on DDPG algorithm was studied.For the partial observability of the environment in the navigation process,it is difficult for the agent to learn and obtain the complete environment information,and can’t learn a good control strategy.An improved DDPG algorithm based on LSTM and asymmetric actor-critic network was proposed.In this method,LSTM module was used for memory reasoning to learn hidden information.Meanwhile,actor network only uses lidar data as the state space input,and critic network uses the complete state of the simulation environment for training,forming an asymmetric network,so as to accelerate the training speed and improve the success rate,so that the MCR-Ⅰ robot can realize automatic navigation obstacle avoidance on complex wall surface and reach the specified target point stably.The simulation environment was established in ROS robot operating system and Gazebo,and the training algorithm framework was established by using Tensor Flow and Gym to verify the superiority of the proposed control strategy. |