Font Size: a A A

Optimal Time Scales for Reinforcement Learning Behaviour Strategies

Posted on:2011-07-13Degree:M.ScType:Thesis
University:McGill University (Canada)Candidate:Comanici, GheorgheFull Text:PDF
GTID:2442390002467225Subject:Artificial Intelligence
Abstract/Summary:
Reinforcement Learning is a branch of Artificial Intelligence addressing the problem of single-agent autonomous sequential decision making. It proposes computational models which do not rely on the complete knowledge of the dynamics of stochastic environments. Options are a formalism used to temporally extend actions towards hierarchically organized behaviour, a concept used to improve learning in large-scale problems. In this thesis we propose a new approach for generating options. Given controllers or behaviour policies as prior knowledge, we learn how to switch between these policies by optimizing the expected total discounted reward of the hierarchical behaviour. We derive gradient descent-based algorithms for learning optimal termination conditions of options, based on a new option termination representation. We provide theoretical guarantees and extentions of widely used Reinforcement Learning algorithms when options have variable time-scales. Finally, we incorporate the proposed approach into policy-gradient methods with linear function approximation.
Keywords/Search Tags:Behaviour, Options
Related items