AI Seminar – Marlos C. Machado
Online
Online
Title: An operator view of policy gradient methods
Abstract: We cast policy gradient methods as the repeated application of two operators: a policy improvement operator 𝙸, which maps any policy π to a better one 𝙸π, and a projection operator 𝙿, which finds the best approximation of 𝙸π in the set of realizable policies. We use this framework to introduce operator-based versions of traditional policy gradient methods such as REINFORCE and PPO, which leads to a better understanding of their original counterparts. We also use the understanding we develop of the role of 𝙸 and 𝙿 to propose a new global lower bound of the expected return. This new perspective allows us to further bridge the gap between policy-based and value-based methods, showing how REINFORCE and the Bellman optimality operator, for example, can be seen as two sides of the same coin.
Bio: Marlos C. Machado is a research scientist at Google Brain, Montreal. Marlos received his Ph.D. from the Department of Computing Science at the University of Alberta. His research interests lie broadly in artificial Intelligence and particularly focus on reinforcement learning, including topics like representation learning, generalization, exploration and temporal abstractions.
The University of Alberta Artificial Intelligence (AI) Seminar is a weekly meeting where researchers (including students, developers, and professors) interested in AI can share their current research. Presenters include local speakers from the University of Alberta and industry as well as other institutions. The seminars discuss a wide range of topics related in any way to Artificial Intelligence, from foundational theoretical work to innovative applications of AI techniques to new fields and problems are of interest.Learn more at the AI Seminar website and by subscribing to the mailing list!
Looking to build AI capacity? Need a speaker at your event?