| Multi-agent reinforcement learning is an important extension of reinforcement learning,which provides solutions for decision-making tasks involving multiple control carriers in real-world scenarios.In recent years,deep multi-agent reinforcement learning has achieved substantial breakthroughs in many tasks that require perception of highdimensional raw input data and decision-making control.However,while deep multiagent reinforcement learning expands the field of reinforcement learning,it also inherits the defects of reinforcement learning.In offline multi-agent reinforcement learning,existing algorithms face the problem of extrapolation errors caused by distribution shifts,and with the increase of the number of agents,the accumulation of extrapolation errors is faster; in online multi-agent reinforcement learning,due to the complexity of the environment and the interaction of agent strategies,the existing algorithms are faced with low sample utilization,slow convergence,model instability and other problems.In response to the above problems,this thesis studies two self-supervised multi-agent reinforcement learning optimization methods in offline and online scenarios,which can achieve significant performance improvement when applied to the baseline algorithm.Our work verifies the effectiveness of the method in the SMAC environment,the main research contents are as follows:(1)Aiming at the problem of distribution shift in offline scenarios,this thesis proposes an offline multi-agent reinforcement learning algorithm based on state enhancement,which expands the offline data set through data augmentation of the state,adds local exploration to the offline algorithm,and reduces the impact of the algorithm on sensitivity to out-of-distribution states.In this thesis,the state augmentation method is combined with the baseline algorithm MACQL.The experimental results show that even a simple data augmentation method can effectively alleviate the extrapolation error problem caused by the distribution shift,allowing the algorithm to more accurately fit the Q value.In addition,in view of the lack of recognized data sets for offline multi-agent reinforcement learning,this paper provides a self-made multi-agent offline data set under the SMAC environment in the experiment,which can evaluate the performance of the algorithm more objectively.(2)Aiming at the problem of low sample efficiency and model instability in online scenarios,this thesis proposes an online multi-agent reinforcement learning algorithm based on state prediction.By constructing a self-supervised state prediction auxiliary task,the state representation learning ability of the reinforcement model is improved,thereby speed up model convergence.In this thesis,the state prediction task is added to the baseline algorithm QMIX.The experimental results show that the effective state features learned by the state prediction network can significantly improve the sample efficiency of the algorithm,and the model is more stable when dealing with multi-agent environment fluctuations. |