Font Size: a A A

Research On AUV Motion Planning Method Based On Maximum Entropy Deep Reinforcement Learning

Posted on:2023-01-31Degree:MasterType:Thesis
Country:ChinaCandidate:X YuFull Text:PDF
GTID:2530306905469404Subject:Ships and marine structures, design of manufacturing
Abstract/Summary:PDF Full Text Request
This research explores how autonomous underwater vehicle(AUV)can rely on global path information and local information obtained by sensors to make decisions efficiently and quickly in an unknown and complex environment,so as to avoid dense obstacles with different shapes,reach the specified target location and complete the motion planning task while meeting various constraints.Aiming at the problems of poor exploration ability,single strategy,high training cost and sparse reward environment in AUV motion planning task,an end-to-end motion planning system based on deep reinforcement learning algorithm is proposed.In order to solve the above problems and improve the effect of AUV motion planning,the following contents are studied:(1)Considering the constraints of system dynamics,sensor performance,obstacle collision range and ocean current interference,the complex motion planning problem is formulated.Based on the neural network model,the end-to-end motion planning architecture based on state information action output is constructed,and the state space based on position information,speed information and obstacle information is determined.At the same time,a simple sonar model is built to realize local obstacle avoidance,and the problem of sonar dead zone is studied.Then the action space of AUV is determined,and the action value output by neural network is clipped and linearly transformed.(2)A motion planning system based on the soft actor-critic(SAC)algorithm is designed,and the maximum entropy method is used to increase the randomness of the strategy,thereby enhancing the AUV’s ability to explore the environment.Aiming at the problem of sparse environmental reward,the motion planning task is decomposed,and a comprehensive external reward function is designed,which can guide the AUV to approach the target point,while constraining its navigation state and optimizing the navigation distance and time.(3)Aiming at the difficulty and time-consuming problem of learning a strategy from scratch in reinforcement learning,the method of generative adversarial imitation learning(GAIL)is introduced to assist AUV training,and expert strategies are used to guide the learning of AUV.Furthermore,a combination algorithm of SAC-GAIL is proposed.The algorithm is trained by mixing GAIL internal reward signals with external reward signals,which reduced the cost of interaction between AUV and the environment.By coordinating the weights of internal and external rewards,the GAIL reward signal will guide the AUV to navigate and encourage it to discover external environmental rewards.(4)Based on the visual simulation of Unity3 D software,this research constructs the randomly distributed dense obstacle environment,determines the episode termination judgment process in the training process,and selects appropriate reward value and algorithm parameters.For the tasks of single target point and multi-target point,the motion planning system based on PPO,SAC and SAC-GAIL algorithms are trained respectively,and the training results are analyzed.Based on the strategy obtained by training,the target point sequence is randomly generated,and several algorithms are tested and compared.Finally good results are obtained,which verifies the effectiveness and stability of the algorithm and reflects the advantages of the algorithm.
Keywords/Search Tags:Autonomous Underwater Vehicle, Motion planning, Obstacle avoidance, Deep reinforcement learning, Soft Actor-Critic, Generative adversarial imitation learning
PDF Full Text Request
Related items