AI Seminar – Csaba Szepesvari
Online
Online
Abstract: Markov decision processes (MDPs) is a minimalist framework that is designed to capture the most important aspects of decision making under uncertainty, a problem of major practical interest and thoroughly studies in reinforcement learning. The unfortunate price of the minimalist approach is that MDPs lack structure and as such planning and learning in MDPs with combinatorial-sized state and action spaces is strongly intractable: Bellman's curse of dimensionality is here to stay in the worst-case. However, apparently already Bellman and his co-workers realized as early as in the 1960s that for many problem of practical interest, the optimal value function of an MDP is well approximated by just using a few basis functions that are standardly used in numerical calculations. As knowing the optimal value function is essentially equivalent to knowing how to act optimally, one hopes that there will be some algorithms that can efficiently compute the few approximating coefficients. If this is possible, we can think of the algorithm as computing the value function in a compressed space. However, until recently not much has been known about whether these compressed computations are possible and when. In this talk, I will discuss a few recent results (some positive, some negative) that are concerned with these compressed computations and conclude with some open problems.
Presenter Bio: Csaba Szepesvari is a Canada CIFAR AI Chair, the team-lead for the “Foundations” team at DeepMind and a Professor of Computing Science at the University of Alberta. He earned his PhD in 1999 from Jozsef Attila University, in Szeged, Hungary. In addition to regularly publishing at top tier journals and conferences, he has (co-)authored three books. Currently, he serves as the action editor of the Journal of Machine Learning Research and as an associate editor of the Mathematics of Operations Research journal, in addition to serving regularly on program committees of various machine learning and AI conferences. Dr. Szepesvari's main interest is developing principled, learning-based approaches to artificial intelligence (AI). He is the co-inventor of UCT, an influential Monte-Carlo tree search algorithm, a variant of which was used in the AlphaGo program which, in a landmark game, defeated the top Go professional Lee Sedol in 2016, ten years of the invention of UCT. In 2020, Dr. Szepesvari co-founded the weekly “Reinforcement Learning Theory virtual seminar series”, which showcases top theoretical work in the area of reinforcement learning with speakers and which is open to attendees from all over the world.
The University of Alberta Artificial Intelligence (AI) Seminar is a weekly meeting where researchers (including students, developers, and professors) interested in AI can share their current research. Presenters include local speakers from the University of Alberta and industry as well as other institutions. The seminars discuss a wide range of topics related in any way to Artificial Intelligence, from foundational theoretical work to innovative applications of AI techniques to new fields and problems are of interest.Learn more at the AI Seminar website and by subscribing to the mailing list!
Looking to build AI capacity? Need a speaker at your event?