Font Size: a A A

Research On Learning Driven Behavior Modeling Methods For Decision Making Of Computer Generated Forces(CGFs)

Posted on:2019-01-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:Q ZhangFull Text:PDF
GTID:1362330611493030Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Computer Generated Forces(CGF)are autonomous or semi-autonomous virtual agent to act out roles of real forces in combat simulation.It has been a long challenging issue to model realistic and adaptive behavior for decision-making of CGFs.Nowadays traditional modeling methods still bring developers painstake of heaven and inefficient knowledge engineering,but result in rigid,predictable behavior performance,which cannot meet the requirements of rapid development and various,reasonable simulation experience in training and experimental applications.To overcome above issues,this thesis is devoted to investigating how we can make use of machine learning methods to facilitate decision-making modeling in military based simulation.The core contributions and innovations of the thesis are as follows:(1)An learning driven behavior modeling framework for decision-making of CGFsAfter analyzing requirements for model representation,performance and function,an learning driven behavior modeling framework is proposed with Behavior Trees(BT)as basic model representation.Three key learning techniques are combined with manual BT modeling method,those are evolving BT for model generation offline,hierarchical reinforcement learning for BT optimization online and multi-agent reinforcement learning(MARL)for expanded coordinated policy.The integrated framework provides a formal modeling flow and application mode to support multi-stage behavior modeling.(2)An evolving BT method with hybrid constraints is proposed to perform the offline generation for decision-making of CGFsTo overcome the difficulties on knowledge acquisition,an evolving BT method with hybrid constraints is proposed to generate CGFs' decision-making model based on behavior evaluation metric of domain experts.The method adopts hybrid constraint to accelerate learning and find better solutions without the loss of the domain-independence.On other hand,a static structural constraint based on BT design mode is proposed to generate more desirable initial BT individuals and reduce search space.On the one hand,a dynamic constraint based on frequent sub-tree mining is designed to accelerate preponderant subtrees accumulation in evolution.Preliminary experiments,carried out on the Pac-Man benchmark,show that the proposed approach outperforms its competitors by achieving better final behavior performance within less episodes.Moreover,the generated models and subtrees are human readable and easy to be fine-tuned by domain experts.(3)An model optimization method based MAXQ hierarchical reinforcement learning is proposed to improve the CGF model online with given domain constrainsA novel approach named MAXQ-BT is proposed to facilitate constrained and adaptive behavior learning online.To tackle the problem of slow convergence when multiple selector nodes learn simultaneously online,the thesis demonstrates how a hand-designed BT relates to an MAXQ task graph,and construct a MAXQ-BT learning framework with new learning rules to improve its multiple selector policy.The final learned knowledge will be used to construct condition node and adjust priority for each selected node.Preliminary experiments are carried out on a predator-prey simulation scenario with different parameter settings.The results show the proposed method outperforms its competitors by achieving better final behavior performance with less learning episodes,which would facilitate BT optimization online and lead to more adaptive and robust model.(4)An model difference degree based coordinated learning method is proposed to adapt individual policy to multi-agent decision-making settingTo solve the problem of online coordination when extending to multi-agent setting,a model difference degree based coordinated learning method is proposed.The method first divides learning process into independent learning and joint learning in coordinated states.To identify the coordinated states under situation with flexible assumptions about domain structure or agent homogeneity,the proposed method introduces sample grouping and a more accurate metric of model difference degree.Those mechanisms can accurately measure the difference between the agent performing task collectively and that performing the task separately.Experimental results on a series of gridworld simulation scenarios show that the proposed approach outperforms its competitors by improving the average agent reward per step and works well in some broader scenarios.At last,we draw conclusions and present the problems which should be concerned on in the future.
Keywords/Search Tags:Combat Simulation, Computer Generated Forces, Adapive Behavior Modeling, Intelligent Decision Making, Behavior Trees, Genetic Programming, Reinforcement Learning, Multi-agent Reinforcement Learning
PDF Full Text Request
Related items