| In multi-agent systems,only part of the information in the agent’s original observation is crucial to the selection of the optimal policies and irrelevant information is often used as noise to interfere with the selection.However,agents cannot learn attention to each part of the information effectively and reduce the negative impact of irrelevant information.In complex settings,the size of observation space increases exponentially with respect to the number of agents and the large-scale observation space aggravates the degree of redundancy in the original observation.Irrelevant and redundant information increases the negative influence on the policies and hinders the RL-Agent from learning stably and efficiently.In this thesis,we propose a novel network architecture,named partial observation division and policy mixing network(ODPM),to address the negative impact of irrelevant information.ODPM utilizes an end-to-end training policy network to divide agent’s original observation.For the representations of group information,a local value estimation module is constructed to get the values corresponding to group information,and then ODPM utilizes the attention mechanism to aggregate the values as a correction to the original policy for the interaction between the agent and the environment.ODPM makes agents give a fine-grained attention to key information and a coarse-grained combination of irrelevant information and reduces the negative influence of irrelevant and redundant information on the current policies and improve training stability and performance.We conduct experiments based on two classic multi-agent settings,Magent Battle and SMAC.Experimental results show that ODPM improves the performance of state-of-the-art DRL approaches compared with several network architectures based on the attention mechanism. |