CLEAN Learning to Improve Coordination and Scalability in Multiagent Systems

Posted on:2014-10-31

Degree:Ph.D

Type:Dissertation

University:Oregon State University

Candidate:HolmesParker, Chris

Full Text:PDF

GTID:1456390008954537

Subject:Engineering

Abstract/Summary:

Recent advances in multiagent learning have led to exciting new capabilities spanning fields as diverse as planetary exploration, air traffic control, military reconnaissance, and airport security. Such algorithms provide a tangible benefit over traditional control algorithms in that they allow fast responses, adapt to dynamic environments, and generally scale well. Unfortunately, because many existing multiagent learning methods are extensions of single agent approaches, they are inhibited by three key issues: i) they treat the actions of other agents as "environmental noise" in an attempt to simplify the problem complexity, ii) they are slow to converge in large systems as the joint action space grows exponentially in the number of agents, and iii) they frequently rely upon the presence of an accurate system model being readily available.;This work addresses these three issues sequentially. First, we improve overall learning performance compared to existing state-of-the-art techniques in the field by embracing the exploration in learning rather than ignoring it or approximating it away. Within multiagent systems, exploration by individual agents significantly alters the dynamics of the environment in which all agents learn. To address this, we introduce the concept of "private" exploration, which enables each agent to present a stationary baseline policy to other agents in order to allow other agents in the system to learn more efficiently. In particular, we introduce Coordinated Learning without Exploratory Action Noise (CLEAN) rewards which improve coordination and performance by utilizing the concept of private exploration in order to remove the negative impact of traditional "public" exploration strategies from learning in multiagent systems. Next, we leverage the fundamental properties of CLEAN rewards that enable private exploration to allow agents to explore multiple potential actions concurrently in a "batch mode" in order to significantly improve learning speed over the state-of-the-art. Finally, we improve the real-world applicability of the proposed techniques by reducing their requirements. Specifically, the CLEAN rewards developed require an accurate partial model (i.e., an accurate model of the system objective) of the system in order to be computed. Unfortunately, many real-world systems are too complex to be modeled or are not known in advance, so an accurate system model is not available a priori. We address this shortcoming by employing model-based reinforcement learning techniques to enable agents to construct their own approximate model of the system objective based upon their observations and use this approximate model to calculate their CLEAN rewards.

Keywords/Search Tags:

CLEAN, System, Multiagent, Improve, Exploration, Model, Agents

Related items

1	Multiagent learning in the presence of agents with limitations
2	Protocoles de cooperation dans les systemes multiagents: Une approche basee sur les relations de dependances (French text)
3	A study of collective intelligence in multiagent systems
4	Structures supporting organizations of autonomous agents
5	The Theoretical Exploration Of Clean Honest Government’s Construction Under The New Situation
6	Study On Contemporary Chinese Clean Government System Under Administrative Ethics Field Of Vision
7	Transition in multiagent organizations
8	A Study On British Synergistic Mechanism Of Multiagent Participation In Urban Regeneration
9	The Analysis On The Coordination Of The Main Agents Of Urban Governance And Its Effect
10	The Research Of The Integrated Economic Evaluation Model Of Oil Exploration And Development