| The emerging multi-agent autonomous collaborative control technology,driven by AI-enabled multidisciplinary integration,holds a significant strategic position in advancing technological innovation,industrial development,shaping the future of warfare,and safeguarding national security and strategic interests.Hence,it has broad application prospects in the fields of socio-economics and military defense.However,its practical application in complex dynamic environments still needs to address key issues such as weak environmental analysis capabilities,non-optimal and non-stationary policies,and poor policy-environment adaptability.This dissertation focuses on the aforementioned research issues and concentrates on studying multi-agent autonomous collaborative control methods in complex dynamic environments,such as stochastic control noise environments,communication-restricted environments,stochastic dynamic environments,and input-constraint environments.(1)In response to the two issues,slow system consensus speed and suboptimal control policies,faced by multi-agent flocking collaborative control algorithms in stochastic noise environments,this dissertation proposes a novel deep reinforcement learning-based multi-agent leader-follower flocking autonomous collaborative control algorithm.Firstly,a graph neural network-based multi-agent deep reinforcement learning algorithm model suitable for flocking control systems is constructed.A graph neural network module with loss weights is designed for the proposed algorithm model to extract features of the structural information of multi-agent flocking systems,enhancing the agents’ capability to analyze system structural information.Additionally,to improve training efficiency,leveraging the homogeneity of agents,this dissertation designs a cooperative learning framework based on network and experience sharing among agents.Research results demonstrate that the proposed reinforcement learning algorithm outperforms traditional reinforcement learning algorithms in terms of learning effectiveness and convergence speed.Moreover,in stochastic control noise environments,the proposed leader-follower flocking control algorithm exhibits superior cluster consensus speed and control stability compared to traditional flocking control algorithms and other reinforcement learning-based flocking control algorithms.(2)Further regarding the communication adaptation and system scalability issues of multi-agent flocking control systems in communication-constrained environments,this dissertation proposes a graph attention mechanism based multi-agent reinforcement learning for flocking autonomous collaborative control.Firstly,this dissertation introduces a distance based graph attention module into the policy network,which reduces the influence of information from distant agents with poor communication quality on the current agent’s decision-making and addresses the adaptation issue of agents in communicationconstrained environments.Additionally,the distance based graph attention module aggregates the information to ensure that the parameters of the policy network are independent of the scale of agents,thereby achieving adaptability of the policy to dynamically varying multi-agent flocking systems.Utilizing the learned policies,agents can achieve distributed autonomous control based on local communication information.The research results demonstrate that,compare to the existed flocking control algorithms,the proposed flocking control algorithm performs well in environments with communication delay and communication distance restrictions while simultaneously adapting to dynamic changes in the number of agents.(3)To further address the adaptive issue of multi-agent flocking autonomous collaborative control systems in stochastic dynamic environments,this dissertation further designs a distance based graph autoencoder based on distance based graph attention,and integrates it into the policy network and evaluation network of multi-agent reinforcement learning to complete the information processing and aggregation of the internal system information and the external obstacle information in the observation state of the agent.The distance based graph autoencoder enables agents to pay more attention to the important information of the system and environment,improve the non-stationary problem of state transition caused by stochastic communication and stochastic dynamic obstacles,enhance the agents’ understanding ability to the observation state,and achieve adaptability to the environmental scenarios with dynamic obstacle scales.In addition,a policy evaluation method based on fusion rewards is adopted to minimize the system control losses generated by flocking and obstacle avoidance processes.The research results demonstrate that the proposed algorithm is significantly superior to other reinforcement learning based flocking collaboration algorithms in terms of adaptability to stochastic dynamic environments and global control policy optimization.At the same time,the adaptability of the algorithm to the scale of agents and obstacles has been fully confirmed.(4)Regarding the practicality of multi-agent flocking autonomous collaborative control algorithms in input-constraint environments,this dissertation proposes a model learning based multi-agent flocking control system framework.Firstly,an agent motion model construction method based on sequential attention neural network is proposed to provide a more realistic agent motion model for environmental interaction.Meanwhile,the cooperative learning framework and distance based graph autoencoder module are introduced into the soft actor-critic(SAC)algorithm,and a multi-agent cooperative SAC(MACSAC)algorithm is proposed to optimize the control policy model.Then,a digital learning system for multi-agent flocking collaborative control is constructed by combining the agent motion model with the MACSAC algorithm to realize off-line policy optimization.Finally,this dissertation designs a behavior reasoning(BR)model based on the prior policy,and introduces the model into the MACSAC algorithm to infer the motion state of noncommunicating adjacent agents and achieve the deeper perception for the environment,which solves the problem of poor control policy caused by the information loss of observation state in input-constraint environments.The research results indicate that the constructed digital learning system can effectively simulate the policy learning of multi-agent flocking in actual environmental scenarios,and demonstrate that the BR model can effectively improve the control stability and adaptability of the MACSAC-based multi-agent flocking collaborative control algorithm in input-constraint environments.The research content of each part of this dissertation gradually deepens from easy to difficult according to the complexity of the environment,and this dissertation uses reinforcement learning as the main research method to solve the problem of multi-agent autonomous collaborative control in complex dynamic environments.This study provides a novel and effective research method for future research on multi-agent collaborative systems,and has important theoretical significance and application value in promoting the application of multi-agent collaborative technology in the social economy and military fields. |