With the continuous deepening of human research on artificial intelligence,artificial intelligence technology has already begun to be used to solve key problems in military,industrial production,and other fields,including the cooperative pursuit problem in multiagent systems.The research on cooperative pursuit strategies for multi-agent systems can effectively solve control problems in tasks such as satellite orbiting,spacecraft tracking,and commercial ship escort.Most research on the multi-agent pursuit-evasion problem currently starts from classical control theory.However,in a multi-agent scenario,the maneuvering strategy of the evading agent may not be known.In such cases,the controller parameters of the pursuing agents are often difficult to design manually.Deep reinforcement learning methods,through reasonable reward function design,can generate end-to-end pursuit strategies without the need for complex mathematical models of agents.This paper introduces deep reinforcement learning methods into multi-agent collaborative pursuit scenarios.First,the pursuit scenario is modeled and the environment is designed,precisely defining the observation space,action space,reward function,and other elements of the pursuing agents in the environment.At the same time,the reward function is divided into team rewards and individual rewards.To address the phenomenon of lazy agents during pursuit,a team reward allocation method based on the Hungarian algorithm is proposed,using the relative distance as the cost matrix to achieve reasonable allocation of team rewards among the pursuing agents.Finally,unlike traditional methods where the evasion strategy of the evading agent is fixed,or other reinforcement learning methods where the evading agent adopts a random walk strategy,this paper uses the artificial potential field method as the evasion strategy of the evading agent,transforming the pursuit scenario into a fully cooperative nature and reducing the learning difficulty of the reinforcement learning algorithm.In terms of algorithms,the most widely used multi-agent deep deterministic policy gradient algorithm(MADDPG)in the field of multi-agent reinforcement learning has been improved.To address the problem of low sample learning efficiency and inaccurate Q-value evaluation of the Critic network,a priority experience replay mechanism is introduced and the Critic network structure is modified.In addition,this paper also introduces the Gumbel-Softmax sampling strategy to enable the MADDPG algorithm to be applied in a discrete action space.Finally,in order to verify the effectiveness of the reinforcement learning training of the pursuit strategy,this paper designs and completes a real-world experiment of multivehicle collaborative pursuit.Through a buoy-type underwater acoustic communication relay station,a bidirectional communication link is constructed from the shore-based control platform to the underwater vehicles.Using a centralized control method,the pursuit strategy is deployed to each vehicle,and the effectiveness of the strategy is ultimately verified. |