| Recent years,with the success of AlphaGo and AlphaStar,multi-agent reinforcement learning,as an important branch of reinforcement learning,has being developed rapidly.Unlike Alpha Star’s macro-task management,most of the current multi-agent reinforcement learning models focus on micro-operation management much more.While cooperative multi-agent reinforcement learning tasks control multiple agents to cooperate to complete a task through the algorithm model.Although there are many excellent algorithms and models in cooperative multi-agent reinforcement learning,it still has much space to for improvement,such as unsatisfactory performance in some complex scenarios and unstable learning process when training multiple agents.In the existing multi-agent hybrid method based on value function decomposition,the structure of model is too simple,which leads to a small range of function clusters that can be represented by the model.Therefor it cannot get better results in some complex environments.In this thesis,the Dueling Transform Network model(referred to as ADTL_mix)is proposed for cooperative multi-agent reinforcement learning tasks.By introducing the Dueling structure in single-agent into multi-agent and extending it,the mixing network is divided into state-value mixing network and advantage-value mixing network.The state value mixing network mixes the individual state value functions into the joint state value function by using a structure based on an attention mechanism,and the advantage value function mixing network directly uses the hyper parameter mixing structure of the QMIX model for mixing,and finally it can obtain a joint action value function by forward mixing from two angles,and jointly optimize the individual agent network from two perspectives.The purpose of introducing the competitive transformation network is to improve the phenomenon that the size of the action value function is independent on the choice of each agent’s specific action in many cases.As for the instability of the training process,this thesis uses a learning rate attenuation method based on the cumulative reward value to control the learning rate of each time or every certain period of time,it also uses a transformation model based on the corresponding relationship between global observation and local observation to eliminate the difference of local observation.By using these two methods can finally achieve the goal of making the model training process firm.Finally,the whole structure of model is improved on the basis of QMIX model.The stability method used in this thesis can be used not only in the model proposed in this thesis,but also in any multi-agent reinforcement learning method ground on value function.In this thesis,experiments are carried out on The Star Craft Multi-Agent Challenge(referred to as SMAC),and several representative maps in different levels are selected.Finally,by using comparative experiments,it certificated that the dueling transform model proposed in this thesis has a significant improvement in the winning rate and training stability of different level tasks analyzed with previous methods.Finally,the convincingness of the suggested method is confirmed by contrasting the self-comparison experiment with the model which only changes the agent structure to dueling structure. |