| Multi-agent system refers to a whole system composed of multiple agents interacting with the environment.Agents interact with each other to achieve common goals or solve common problems through information exchange and coordinated actions.By combining reinforcement learning with deep neural network,deep reinforcement learning greatly improves the feature extraction ability and expression ability of the reinforcement learning algorithm,strengthens the perception and adaptability of the agent to the environment,and can solve more complex decision-making problems.Deep reinforcement learning has also become the mainstream method of multi-agent system decision making.In the multi-agent confrontation game environment,the actions and strategy choices of agents will affect each other,and each agent needs to comprehensively consider its own actions and returns as well as the behaviors of other agents.Due to the instability of the environment,the experience playback pool stores a large number of inefficient samples,which reduces the learning efficiency of agents.In complex multi-agent decision-making task,different stages have different task objectives,which also increases the difficulty of strategy learning.The above problems seriously affect the performance of deep reinforcement learning algorithm in multi-agent environment.At present,most multi-agent training environments are based on game platforms to verify the effectiveness of algorithms,or for simulation simulations under specific tasks.There are relatively few universal reinforcement learning training environments,so constructing a multi-agent training environment that can quickly set task environments and customize agent functions is also an urgent problem to be solved.Aiming at the above problems,this paper reviews the historical development of multiagent reinforcement learning and combines existing works.The main research content of this paper mainly includes the following three parts:(1)Aiming at the problem of low sample efficiency of experience replay pool in multiagent environment,a multi-level experience replay pool method is proposed.Firstly,the update mechanism of experience replay pool is improved,and a similar sample filtering link is added.Secondly,the priority weight and sample screening adapted to the current state are added to each sample to improve the sampling efficiency.(2)For the multi-objective problem in complex multi-agent tasks,a multi-agent value decomposition method based on observation information trade-off is proposed.The algorithm uses the attention mechanism under the existing value decomposition network to make the agent policy network pay more attention to the information that is more critical to the current task goal,so that the agent can adapt to a more complex training environment and improve the convergence speed of the algorithm.(3)In order to develop a multi-agent reinforcement learning training environment that can quickly realize task design and agent design,this paper takes unmanned swarm confrontation as the background,and uses the ML-Agents open source framework under Unity engine to design and implement a simulation training environment platform for multiagent confrontation game.The training environment visualization,environment design,agent behavior constraints and other functions are completed.At the same time,the experimental verification of the multi-agent reinforcement learning algorithm proposed in this paper is completed in this environment,and the feasibility and advancement of the algorithm are proved. |