| With the widespread application of underwater detection and underwater operations,people demand underwater vehicles to achieve more intelligent motion control.Due to the complex environment of the underwater vehicle and its susceptibility to environmental disturbances,it is difficult to establish accurate dynamics models.Therefore,the use of adaptive and efficient controllers and the accurate execution of control forces have become particularly critical.Reinforcement learning can adaptively adjust control strategies to achieve optimal control in the context of unknown models.In this paper,we take the Remote Operated Vehicle(ROV)as the research object,study the application of reinforcement learning controller in motion control.First,an improved thrust allocation method based on genetic algorithm is proposed.According to the relationship between the thruster space layout and energy consumption,the propulsion system model is established.The genetic algorithm is used to solve the non-linear optimization problem,and the thrust allocation function is realized through simulated experiments.Secondly,a controller based on the Deep Deterministic Policy Gradient algorithm(DDPG)is designed.Through the selection of state and action,and the setting of reward function,the DDPG controller is able to control the motion of ROV through sensor information.Through adding the speed term and negative exponential function,the reward function is improved and the effect of point stabilization control is improved.Thirdly,a supervised reinforcement learning method is proposed.DDPG require an extensive amount of training data and long training time to converge to a meaningful solution.Supervised Experience Replay and Behavioral Cloning are introduced to tackle this problem.The proportional-integral-derivative(PID)controller is used as an expert demonstration,and the demonstration controller is used to guide the learning direction of DDPG.The convergence time of the two reinforcement learning methods is compared through simulated experiments.Finally,a disturbance rejection DDPG controller based on error estimation is proposed.In order to reduce interference in the actual environment,the motion state of the vehicle is estimated through support vector regression.The estimated error in the case of disturbance is calculated,and the non-linear compensation controller is used to eliminate the estimated error.The compensation controller is used as expert demonstration to make the reinforcement learning controller still robust under disturbance conditions.Combining these approaches,this paper proposes an effective improvement to the energysaving thrust allocation method of redundant propulsion system,and demonstrate a reinforcement learning strategy for ROV motion control.In order to accelerate the speed of reinforcement leaning training,a supervised method was introduced to make the DDPG controller significantly improve the training efficiency.This article explores the application of reinforcement learning for motion control of ROV. |