Font Size: a A A

Research On Path Tracking And Autonomous Obstacle Avoidance Of Autonomous Underwater Vehicle Based On EER-PPO Algorithm

Posted on:2023-03-26Degree:MasterType:Thesis
Country:ChinaCandidate:M Y LiFull Text:PDF
GTID:2568306614987209Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Autonomous Underwater Vehicle(AUV)has a wide range of applications in the field of ocean exploration and is one of the key research directions of various countries.The path tracking function of AUV is the basis for most tasks,and the autonomous obstacle avoidance function of AUV can ensure the safety of its operation.Therefore,it is of great significance to carry out research on AUV path tracking and autonomous obstacle avoidance control algorithms.AUVs are often affected by currents,waves and other disturbances when performing path tracking and autonomous obstacle avoidance tasks.The model-based AUV control methods not only need to obtain an accurate dynamic model of the AUV,but also need to set the control parameters in advance,which usually lead to be difficult to cope with the changing seafloor environment.Reinforcement learning can make the agent learn a set of optimal policies from the dynamic environment by maximizing the expected cumulative reward,and has excellent adaptive ability in the face of complex external environment.However,the state space of AUV path tracking and autonomous obstacle avoidance tasks is very large and the environment is relatively complex.It is difficult for pure value-based reinforcement learning algorithms and policy-based reinforcement learning algorithms to complete the above tasks.Proximal Policy Optimization Algorithms(PPO)is a stochastic policy gradient algorithm that combines the advantages of value-based reinforcement learning algorithms and policybased reinforcement learning algorithms,with model-free,strong exploratory characteristics,which can solve problem that status and action space are both continuous.PPO has been widely used in the field of intelligent control.However,the PPO still has the following shortcomings when applied to AUV path tracking and autonomous obstacle avoidance tasks:(1)The PPO algorithm updates the policy network through random sampling,and when solving the problems with large state space,it may be difficult to quickly find the optimal control policy,which resulting in a slow training speed of the policy network;(2)The upper and lower limits of the update range of the policy network of the PPO algorithm are fixed,and the training policy cannot be dynamically adjusted according to the training situation.It will make it difficult for the network to converge stably.In response to the above problems,this paper proposes a PPO algorithm based on the excellent experience set(Excellent Experience Replay PPO,EER-PPO).Compared with the PPO algorithm,this method has the following advantages:(1)Based on the original experience set,the excellent experience set sampling mechanism is added,which improves the probability of the policy network finding the optimal control strategy and speeds up the network training speed;(2)The EER-PPO algorithm dynamically adjusts the clip factor ε according to the current average cumulative reward and the historical average cumulative reward,so that the policy network can increase the search space in the early stage of training and increase the probability of finding an excellent policy;in the later stage of training,the update speed of the policy network can be limited to make its convergence more stable.This paper uses the Newton-Euler equation to establish the AUV dynamic model as the basis of the simulation experiment,and uses the Line of Sight(LOS)method to obtain the state information to complete the AUV path tracking and autonomous obstacle avoidance simulation experiments.During the simulation experiment,the controller based on the EER-PPO algorithm is used to complete four groups of experiments:AUV linear path tracking,sinusoidal path tracking,linear path tracking with autonomous obstacle avoidance,and sinusoidal path tracking with autonomous obstacle avoidance.The controllers of the PPO algorithm,DDPG algorithm and TD3 algorithm are compared.The experimental results show that the controller based on the EER-PPO algorithm has faster training speed,more stable convergence,and better effect on path tracking and autonomous obstacle avoidance.
Keywords/Search Tags:Autonomous Underwater Vehicle, Proximal Policy Optimization, Path Following, Obstacle Avoidance
PDF Full Text Request
Related items