News
Now that the 2020 Tea Time Talks are on Youtube, you can always have time for tea with Amii and the RLAI Lab! Hosted by Amii’s Chief Scientific Advisory Dr. Richard S. Sutton, these 20-minute talks on technical topics are delivered by students, faculty and guests. The talks are a relaxed and informal way of hearing leaders in AI discuss future lines of research they may explore, with topics ranging from ideas starting to take root to fully-finished projects.
Week ten of the Tea Time Talks features:
In this talk, Abhishek talks about a family of new learning and planning algorithms for average-reward Markov decision processes. Key to these algorithms is the use of the temporal-difference (TD) error to update the reward rate estimate instead of the conventional error, enabling proofs of convergence in the general off-policy case without recourse to any reference states. Empirically, this generally results in faster learning, while reliance on a reference state can result in slower learning and risks divergence. Abhishek also presents a general technique to estimate the actual ‘centered’ value function rather than the value function plus an offset.
Spinal cord injury can cause paralysis of the legs. In this talk, Ashley introduces a spinal cord implant that her lab used to generate walking in a cat model. She then describes how they used general value functions (GVFs) and Pavlovian control to produce highly adaptable over-ground walking behaviour.
In this talk, Alex discusses a model-based RL algorithm that is based on optimism principle: in each episode, the set of models that are “consistent” with the data collected is constructed. The criterion of consistency is based on the total squared error of that the model incurs on the task of predicting values as determined by the last value estimate along the transitions. The next value function is then chosen by solving the optimistic planning problem with the constructed set of models.
Policy gradient methods have a critic baseline to reduce the variance of their estimate. In this talk, Shivam discusses a simple idea for an analogous baseline for the log-likelihood part of the policy gradient. First, Shivam shows that the softmax policy gradient in the case of bandits can be written in two different but equivalent expressions, which motivates the log-likelihood baseline. While one of these expressions is the widely-used regular expression, the other doesn't seem to be popular in the literature. Shivam then shows how these expressions can be extended to the full Markov decision process (MDP) case under certain assumptions.
The Tea Time Talks have now concluded for the year, but stay tuned as we will be uploading the remaining talks in the weeks ahead. In the meantime, you can rewatch or catch up on previous talks on our Youtube playlist.
Nov 7th 2024
News
Amii partners with pipikwan pêhtâkwan and its startup company wâsikan kisewâtisiwin, to harness AI in efforts to challenge misinformation about Indigenous People and include Indigenous People in the development of AI. The project is supported by the PrairiesCan commitment to accelerate AI adoption among SMEs in the Prairie region.
Nov 7th 2024
News
Amii Fellow and Canada CIFAR AI Chair Russ Greiner and University of Alberta researcher and collaborator David Wishart were awarded the Brockhouse Canada Prize for Interdisciplinary Research in Science and Engineering from the National Sciences and Engineering Research Council of Canada (NSERC).
Nov 6th 2024
News
Amii founding member Jonathan Schaeffer has spent 40 years making huge impacts in game theory and AI. Now he’s retiring from academia and sharing some of the insights he’s gained over his impressive career.
Looking to build AI capacity? Need a speaker at your event?