Font Size: a A A

Research On Flight Conflict Resolution Strategy Based On Cooperative Multi-agent Reinforcement Learning

Posted on:2022-10-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:H L WuFull Text:PDF
GTID:1482306734971859Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of air transportation,air traffic is becoming more and more congested.Meanwhile,under the background of the flexible use of airspace,the uncertainty of aircraft trajectory is increased.Therefore,the risk of flight conflict is increased.To ensure the orderly progress of air transportation,the demand for intelligent conflict resolution strategy is becoming more urgent.This thesis aims to use the cooperative Multi-Agent Reinforcement Learning Method(MARL)to study the multiple aircraft flight conflict resolution tasks: firstly,the multiple aircraft flight conflict resolution task is modeled as a Decentralized Partial Observation Markov Decision Process(Dec-POMDP);then,the key technologies of the cooperative MARL method are studied,such as how each agent uses the global reward to estimate the joint action value more accurately,how to learn the decentralized policy efficiently based on the joint action value,and how to make the decentralized policy be able to cooperate with other agents;finally,the improved cooperative MARL method is integrated into a unified cooperative MARL method,and it is applied to multiple aircraft flight conflict scenarios to verify its effectiveness.The contributions of this paper are as follows:(1)A Sub Maximization Overestimation Reduction Method Based on Double Averaging Action Values(Sub-AVG)is proposed,which aims to reduce the harm of overestimation error and estimate the action values more accurately in the cooperative MARL methods.Firstly,it is proved that there is an overestimation error in the value function decomposition method,and a lower bound of overestimation error is given.Then,based on the fact that the lower bound is always greater than zero,a Sub-AVG method is proposed.The method maintains multiple target networks at the same time to maintain various action values estimated in different periods,then eliminates the excessive overestimation errors by discarding the larger action values.Thus,an overall lower update target is obtained to reduce the harm of the overestimation error.The experimental results show that the overestimation error does exist in the cooperative MARL methods,and it will be harmful to the policy performance,besides,the proposed method can effectively reduce the estimation of action values and obtain better policy performance.(2)A Generating Individual Reward for Cooperative Multi-agent Reinforcement Learning(GIIR)method is proposed,which aims to guide agents to make better use of the joint action value to learn the decentralized policy.During generating the individual reward,an intrinsic reward encoder is introduced to generate the individual reward distribution based on the partial observation,and the specific individual reward is obtained by sampling on the individual reward distribution.During training,since the goals of the individual reward and the joint action value are both to help the multi-agent system obtain the more global reward,the individual reward can be optimized with the update of the joint action value;to this end,through introducing hypernetworks,and regarding the hypernetworks,the agent neural network and mixing network as the individual reward decoder,so as to ensure that the individual reward can carry out end-toend training with the joint action value.In the action execution stage,the agent network uses the individual reward to evaluate the individual action value,and then formulate the decentralization policy.The experimental results show that the introduced individual reward can help the multiagent system obtain more global reward,and obtain better policy performance.(3)An Agreement Generation Based on Unified Behavior Consciousness for Multi-agent Reinforcement Learning(AGMA)method is proposed,which aims to reach an agreement among agents by unifying the behavior consciousness of all agents.Firstly,each agent generates a situation perception containing the information about all agents based on the partial observation,and increases the mutual information between the situation perception and the global state to extract more information about the global state;besides,a variational distribution based on the global state is introduced to assist in calculating the mutual information.Then,each agent maintains a high-level behavior consciousness based on the situation perception and obtains the agreement by unifying the behavior consciousness among agents.Finally,in order to make the behavior consciousness guide the agent to make behavior decisions after obtaining the agreement,a Hyper Recurrent Neural Network(Hyper RNN)is introduced to modify the parameters of the RNN in the agent network based on the behavior consciousness,so as to guide the behavior decision through affecting the estimation of the individual action value.Experimental results show that the proposed method can obtain better policy performance,and the agreement,the situation perception and Hyper RNN do contribute to the performance improvement.(4)The cooperative MARL method is applied to study the resolution strategy of multiple aircraft flight conflicts.Firstly,the following characteristics of flight conflict mission are considered:the resolution strategy of the whole airspace should be through the cooperation of all aircraft,the resolution strategy of each aircraft should be a synthesis of three adjustment actions,the observation of aircraft should be partial observation,and the aircraft should quickly return to the original route after the conflict resolution.Thus,the multiple aircraft flight conflict resolution task is modeled as a Dec-POMDP.Then,the improved methods for cooperative MARL in this thesis are integrated into a unified cooperative MARL method,and it is applied to different flight conflict scenarios to study the resolution strategy.The experimental results show that the cooperative MARL method is efficient in these tasks,and give typical conflict resolution strategies under comprehensive action exploration and specific preference action exploration.
Keywords/Search Tags:cooperative multi-agent reinforcement learning, multiple aircraft flight conflict resolution, overestimation error, confidence allocation of global reward, cooperation
PDF Full Text Request
Related items