RL Theory Seminar: Adaptive Reward-Free Exploration
Online
Online
Amii is proud to support our province's growing AI community. The RL Theory Seminars are hosted independently by researchers: Gergely Neu, Ciara Pike-Burke, and Amii Fellow Csaba Szepesvári.
Speaker: Pierre Ménard (Inria Lille)
Paper: https://arxiv.org/abs/2006.06294
Authors: Emilie Kaufmann, Pierre Ménard, Omar Darwiche Domingues, Anders Jonsson, Edouard Leurent, Michal Valko
Abstract: Reward-free exploration is a reinforcement learning setting recently studied by Jin et al., who address it by running several algorithms with regret guarantees in parallel. In our work, we instead propose a more adaptive approach for reward-free exploration which directly reduces upper bounds on the maximum MDP estimation error. We show that, interestingly, our reward-free UCRL algorithm can be seen as a variant of an algorithm of Fiechter from 1994, originally proposed for a different objective that we call best-policy identification. We prove that RF-UCRL needs O(SAH^4/ε^2)ln(1/δ)) episodes to output, with probability 1−δ, an ε-approximation of the optimal policy for any reward function. We empirically compare it to oracle strategies using a generative model.
Looking to build AI capacity? Need a speaker at your event?