Font Size: a A A

Research On Autonomous Evasion Task Decision-making Methon For Multi-agent System

Posted on:2022-08-14Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y ZhaoFull Text:PDF
GTID:1482306569486394Subject:Aeronautical and Astronautical Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of aerospace industry,more and more related equipment use group cooperation to perform tasks,showing the typical characteristics of multi-agent.It is suitable to use multi-agent system to analyze and model it,and the demand for distributed autonomous decision-making technology is increasingly prominent.Autonomous avoidance of flights is the most direct and effective method to ensure its security.However,the current research on related technologies is mostly based on static global planning algorithm,which is difficult to meet the real-time cooperation needs of multiple entities in dynamic scenarios.In order to solve the problem of multi-agent cooperative evasion,this thesis uses the idea of multi-agent system to study the decision-making technology of evasion task.Multi agent system has the advantages of autonomy,high efficiency and scalability.This thesis combines it with reinforcement learning technology to design a new decision algorithm for aircraft.In this thesis,based on the typical scenarios of spacecraft anti interception,UAVs collision avoidance and space manipulator trajectory planning,the problem of multi-agent autonomous avoidance task decision-making is studied.Combined with the real constraints,the real-time decision-making of multi-agent is realized.The main research results are as follows:Based on motion analysis,the mathematical model of interaction between agent and environment is given.Aiming at the problem of evasion decision-making in multi-agent system,a relevant decision-making model is established,and some observable constraints are considered.Combined with game theory,the multi-agent Markov game is discussed.This thesis analyzes the design method of conventional return function,and gives three typical ways to solve sequential decision.In the aspect of Multi-Agent Reinforcement Learning,the decision-making process of avoiding maneuver scene and space manipulator capture scene is analyzed;The strategy gradient method is applied to the improvemen t of multi-agent system;A new actor critical reinforcement learning method based on strategy coordination and reliability allocation is proposed to solve the problem of training and Strategy Promotion of decision-makers under global observable conditions,and gives the analysis of the convergence;According to the task requirements,the neural network structure and algorithm flow of each key link are designed.The training was carried out in many missions such as anti interception and space manipulator capture.The results of cumulative return and success rate are compared and analyzed to verify the correctness and effectiveness of the proposed method.In the practical engineering application of reinforcement learning algorithm,the constraints of typical task scenarios on decision efficiency are analyzed;The neural network structure for task decision-making is designed for the problem scene,and compression method is designed for different parts of the problem scene;Based on the clustering and quantification of neural network weight,an adaptive hierarchical pruning method is proposed.The method is used to dynamically prune and compress the target neural network by retraining,which can improve the speed of decision-maker and compress its storage space;The design of the reinforcement learning system is carried out for some task scenarios under some observable conditions,and the design method of return function is given in detail.The proposed method is simulated and verified in the high density UAV s scene and anti interception scene in the finite airspace,and the performance of the algorithm is analyzed and discussed from the aspects of decision-making speed,cumulative return value and success rate,and the adaptability of the proposed reinforcement learning method to the entity variable environment is summarized.On the problem of sparse reward in task environment,the limitations of task scene constraints and conventional reinforcement learning algorithms are analyzed,and a case evaluation mechanism is designed;The inverse value method is proposed to enhance learning algorithm,which solves the problem of reward delay distribution and low learning efficiency of non reward guidance system;The self-learning system is designed based on Markov game theory,and the convergence of the proposed algorithm is analyzed with heuristic search theory;The input of disturbed state is analyzed and a finite state machine for comparative analysis is designed;The advantages and improvement directions of the algorith m are analyzed.In the simulation,the comparison and analysis of the decision-makers obtained in the previous section of the article are carried out,which verifies the correctness and the related performance advantages of the proposed algorithm.This thesis researchs on the multi-agent decision-making technology,studies the important directions of credit assignment,policy coordination,neural network speed up and sparse reward,and improves the survival rate of aerospace hardware equipment in the implementation of tasks.The research results have certain reference value for the development of aerospace safety assurance technology.
Keywords/Search Tags:multi-agent system, reinforcement learning, evasion maneuver, neural network optimization, credit assignment, sparse reward
PDF Full Text Request
Related items