Font Size: a A A

Research On Multi-Agent Cooperative Decision-making Under Empathetic Mechanism

Posted on:2024-01-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:J Z ChenFull Text:PDF
GTID:1528307376981549Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
“Swarm intelligence”is a comprehensive technology to break through the bottleneck of individual performance with the help of swarm cooperation.The transition of artificial intelligence from individual intelligence to“swarm intelligence”is highly consistent with the evolution of dominant species to scale communities in the biological world,which has prompted many biomimetic studies based on eusocial insects with clear organizational structure.However,the fact that individual specificity and acquired creativity are ignore d limits the further expansion of this model.In contrast,human cooperation patterns,which are influenced by both genes and culture,are characterized by“harmony but diversity”and are more in line with the vision of“swarm intelligence”.In fact,as one of the 125 challenging scientific questions put forward in the journal Science,“How human cooperative behavior develops”has been concerned and discussed for a long ti me in the fields of social psychology,cognitive neuroscience,and behavioral economics.A commonly accepted view is that empathy,as an ability to share the emotions of others and to understand their perspectives and motivations,is closely related to human altruism,prosocial behavior,and the formation of morality.Although the research on the biological mechanism of empathy is still in its infancy,some analyses of the functions and characteristics of empathy have reached a basic consensus.The empathy mechanism can participate in the closed loop of perception-decision-action through spontaneous affective empathy and active cognitive empathy.In this context,this paper attempts to abstract the empathy mechanism based on the existing consensus conclusions and deeply explore the impact of the introduction of the empathy mechanism on the collaborative decision-making of multi-agents,so as to fill in the interdisciplinary issues in the field of artificial intelligence to a certain extent.Specifically,the main work involved in this paper is as follows:Firstly,for the problem that existing utility-coupled models cannot characterize the anisotropy of deep empathy and distinguish empathy modes,this paper proposes a method of empathy construction based on the non-stationary Markov chain.This method parameterizes the self-other separation mechanism proposed by neuropsychology for empathy effectiveness into the ego reentry coefficients in a class of non-stationary Markov chains.The proposed model has significant properties of internal absorbability,inhibition,concentration,and anisotropy and can effectively distinguish affecti ve empathy from cognitive empathy according to the t ierative direction of the transition probability matrix and better match the social data set.In addition,the confidence connotation of ego reentry also makes the empathy model suitable for various abiot ic participation scenarios.The three application examples of models in human-machine interaction network,multi-sensor information fusion,and semi-supervised clustering show that the empathy model with self-other separation mechanism can be used as a model paradigm to describe utility transmission and confidence distribution in the network and provide a model and theoretical support for interaction characteristics analysis and optimization algorithm design in biological or abiotic domains.Secondly,for the problem that the direct searching process of the dominant global strategy corresponding to deep empathy has high computational complexity close to O(n ~3),this paper proposes a candidate strategy elimination algorithm based on the upper bound estimation of iterative errors.The core principle of this method is to associate the upper bound of the it eration error of the objective function with the upper bound of the iteration error of the empathy utility.Due to the attraction property of the internal state in the deep empathy model,the error of empathy utility decreases with the increase of the number of iterations,which makes it possible to iteratively simplify the set of candidate policies according to the difference between the evaluation strategy and the current optimal strategy.For the polynomial objective function,the decision algorithm based on affective utility reduces the algorithm complexity to O(n~y)(1≤y≤2),which effectively improves the solving efficiency of the dominant strategy in a large-scale environment.In addition,various subproblems covered by the decision-making method under the paradigm of deep empathy are analyzed,and the UAV formation selection problem is taken as an example to verify the high e fficiency of the algorithm in dealing with large-scale systems.Then,for the problem of parameter configuration difficulty of the empathy model in a cooperative environment and lack of collaborative exploration caused by target sharing,this paper proposes a design and optimization method of empathy interaction mechanism based on the maximum entropy principle.In this method,the parameter determination of the empathy model is transformed into a conditional optimization problem,and the solved model is use d as the internal protocol of the interactive system to guide the formation of a multi-agent strategy.According to the different configurations of the model’s empathic parameters,the interaction modes can be divided into the collective mode,equal mode,oligopolistic mode,and dictatorial mode,which can effectively realize the matching extension of coordination problems.In addition,considering that the lack of exploration caused by reward sharing in collective mode is equivalent to the credit allocation problem associated with empathy parameter configuration,this paper further designs an optimization framework of the empathy parameter based on the principle of maximum entropy,and integrates it into DTDE(Decentralized Training with Decentralized Execution)and CTDE(Centralized Training with Decentralized Execution)reinforcement learning algorithms,respectively.According to the test results of the matrix game and cooperative navigation experiment,the algorithm can effectively improve cooperation efficiency in collective mode.Furthermore,for the problem that it is difficult to consider the expression of individual cooperative intention,sequential rationality,and risk control in a non-cooperative environment,this paper proposes an adaptive empathic cooperation and competition method based on neutrality estimation.By introducing empat hy as a dynamic contract,this method transforms the problem of reliable interaction of agents in non-cooperative environments into the problem of designing adaptive empathy contracts in a class of differential games.By decoupling the internal functional layers,the design of an empathetic agent is modularized into an emotional model based on subjective well-being,a dynamic empathic contract based on neutrality assessment,and a closed-loop decision based on a gradient ascent algorithm.By analyzing the dynamic behavior of differential matrix games,this paper proves that agents with dynamic compassionate contracts can perform adaptive group cooperation and competition under the premise of ensuring security benefits.Experimental tests also show that dynam ic empathic contracts can effectively stimulate the prosocial behaviors of agents,including altruism,cooperation,and fairness.Compared with other intrinsically driven learning algorithms,the algorithm based on dynamic empathy contracts has more comprehensive coverage advantages in terms of convergence,fairness,security,adaptability,and structural scalability.Finally,the problem of the difficulty in estimating the external intentions of cooperation or confrontation and the lack of maneuverability of the affine formation in a class of complex tactical environment formation is addressed.This paper supplements an individual estimation method of game players’intentions or environmental situations from the perspective of a third party.Combining this method with a class of formation control methods with high degrees of freedom,a tactical formation regulation and control framework based on pan-empathy estimation is further proposed.In terms of intention estimation,inverse reinforcement learning is used to directly estimate the pan-empathetic parameters representing the player’s intention in the random game environment.The proposed pan-empathetic parameter estimation method based on Nash equilibrium and correlation equilibrium assumptions in the non-cooperative environment can provide a basis for situation assessment and adjustment of subsequent formation strategies.In terms of formation control,to meet the needs of offensive/defensive and cooperative tasks under different situations,the affine formation control is extended to projective formation control.The extended method realizes the consistent tracking of arbitrary formations under the premise of unchanged convexity,improves the controllable freedom of formations,and provides a theoretical guarantee for complex tactical planning and execution.
Keywords/Search Tags:collaborative decision-making, multi-agent system, empathy mechanism, intention estimation, inverse reinforcement learning, affine formation control
PDF Full Text Request
Related items