| Reinforcement learning is an important branch of machine learning,which describes the process in which an agent learns strategies to maximize benefits or achieve task objectives through continuous interaction with the environment.Different from other machine learning methods,reinforcement learning focuses more on interactive goal-oriented learning.With the development of deep learning,reinforcement learning with the support of neural network,makes agents have stronger decision-making ability.Multi-agent reinforcement learning,as a branch of reinforcement learning,emphasizes the study of reinforcement learning from the perspective of multiple agents,and has a wide range of applications in traffic scheduling,game AI and automatic driving.This thesis studies the multi-agent reinforcement learning algorithm and its application.The main contents are as follows:(1)This thesis aims at the problems of multi-agent reinforcement learning algorithm,such as instability,scalability and local observability,based on the framework of "centralized training and decentralized execution",we propose multi-agent information filter deep deterministic policy gradient(MIFDDPG)algorithm.Information filtering mainly includes two parts.In the actor part,the parameter sharing mechanism and GRU mechanism are introduced to enable agents to combine the strategies of each agent and obtain the dynamic perception of the environment.In the critic part,the dual network mechanism is introduced.Through the joint learning of attention network and fully connected network,the accurate evaluation of agent state action value is obtained.Experiments are carried out in continuous action scene MPE and more complex discrete action scene SMAC.The results show that the algorithm in this paper can improve the learning efficiency,and the final performance is better than the existing algorithms.(2)Based on the research of multi-agent reinforcement learning algorithm,this thesis attempts to apply it to the field of mobile edge computing.This thesis proposes a multiple UAVs(Unmanned Aerial Vehicles,UAV)assisted offloading strategy based on MIFDDPG algorithm,designs state,action and reward of the scene.The final experiment shows that MIFDDPG algorithm is more effective in resource allocation and path planning than traditional offloading strategy and other reinforcement learning algorithms.(3)We have built a data index monitoring platform to store the uav data and system resource scheduling in the operation process into the InfluxDB database and display it through the Grafana visualization tool,which allows researchers to analyze the change process of data in different stages more intuitively,and figure out the key factors that affect uav performance in each stage. |