| In recent years,Autonomous Underwater Vehicles-AUV has played an increasingly important role in marine resource development,seabed salvage,military and other fields.Facing the complex and changeable environment of the ocean,autonomous underwater robots must have a strong self-learning ability to evaluate and perceive the surrounding environment and make autonomous decisions.In this paper,deep reinforcement learning is applied to the motion control tasks of autonomous underwater robots,to solve the problems of poor generalization of traditional control algorithms and difficulty in tuning parameters,and to carry out a series of research work around the trajectory tracking tasks of autonomous underwater robots.This article first uses REMUS-AUV as a prototype to carry out dynamic modeling,and decouple the equations of motion according to the characteristics of the horizontal plane and the vertical plane.In order to test the rationality of the model built,referring to the maneuverability test part of the submarine maneuverability,the following tests were carried out on the established AUV model: 1.Constant rotation test in the horizontal plane;2.Vertical straight-season depth test;3.Space test.Finally,the analysis of the above experiments proves that the established motion model has normal maneuverability and can complete the trajectory tracking control task.Markov modeling is performed on the horizontal plane,vertical plane,and threedimensional trajectory tracking problems of the autonomous underwater vehicle-AUV.Aiming at the problems of low data utilization and sparse rewards in the traditional reinforcement learning algorithm for continuous control tasks,the update method of the median function of the SAC algorithm is improved,a new experience playback mechanism is introduced,and finally the trajectory tracking simulation experiment of the AUV horizontal plane is carried out.Aiming at the problem that the AUV’s vertical surface movement is affected by the recovery torque and the posture is difficult to adjust,the main line and auxiliary reward function are designed to prevent abnormal behaviors such as "turning in situ" in the early training period;improve the Actor-Critic network structure to improve the accuracy of the AUV state evaluation And the stability of the algorithm.Finally,this algorithm is used to carry out the AUV trajectory tracking simulation experiment on the vertical plane.Aiming at the problem that the AUV’s motion state in three-dimensional space and too many spatial variables cause the algorithm training to take too long or even to be unable to train,the following improvements are proposed: 1.Optimize the state space and reduce the input dimension of the neural network;2.Use a distributed reinforcement learning algorithm Establish a multi-threaded operating mechanism,reduce the dependence of the computer GPU,make full use of CPU resources,and finally perform AUV trajectory tracking simulation experiments in three-dimensional space to verify the effectiveness of the algorithm. |