| With the rapid development of modern science and technology,the development of underwater resources and marine scientific research have become more and more important,autonomous underwater vehicle(Autonomous Underwater Vehicle,AUV)swarm formation has become an effective way Marine scientific research means,and task allocation is a very important link in the swarm formation.In the AUV cluster formation system,there is a problem of dimensionality disaster caused by long training time and excessive AUV state and action space,which makes the AUV formation unable to effectively assign tasks.In view of the above problems,this paper studies and improves the task assignment of AUV cluster formation from two aspects of task allocation efficiency and resource consumption.The thesis work is as follows:Aiming at the problem of insufficient formation stability when AUV formations perform tasks,an AUV formation-keeping controller is designed,and a formation-keeping control algorithm based on Actor-Critic is proposed.Firstly,determine the three controlled quantities of AUV formation navigating in three-dimensional space,namely speed,yaw angle and pitch angle.Then the Actor-Critic algorithm is improved,and the Actor-Critic dual network structure is designed to realize the AUV formation maintenance controller.Then,the global deviation state value of speed,yaw angle and pitch angle is used as the input of the controller,and the input is processed through the strong fitting and approximation ability of the neural network,so that the controlled amount of the follower AUV is consistent with the target amount of the leader AUV.Finally,the tracking distance between the leader and the follower is analyzed from the x,y,and z directions through simulation,and it is verified that the AUV formation can be maintained stably.On the premise that the formation can be maintained stably,a Neural Network Deterministic Policy Gradient based on Shared Priority Experience Playback(SP-NNDPG)algorithm is proposed to solve the problem of hierarchical structure AUV formation task assignment.Firstly,the constraints and rewards of AUV formation task execution are analyzed,and the constraint function and reward function are designed,which are respectively used to restrict and guide AUV formation to perform tasks safely.Secondly,in order to solve the problem of dimensional disaster in AUV formations in complex environments,the deterministic policy gradient is used to reduce the action-state dimensional space;in view of the problem of long training time allocated to traditional reinforcement learning tasks,the method of shared priority experience playback is adopted to improve The convergence speed of the algorithm is improved,and the training time of task assignment is reduced.Then the relevant framework and overall process of the SPNNDPG algorithm are designed.Finally,simulation experiments were carried out in three different 3D scenarios to realize the assignment of AUV formation tasks,and the advantages of the algorithm proposed in this paper for the assignment of AUV formation tasks were verified according to the three indicators of average reward value,average voyage length and task completion rate.These research results will provide a powerful reference and reference for solving the problem of AUV formation task assignment in practical applications. |