Research Post
In this paper we introduce the idea of improving the performance of parametric temporal-difference (TD) learning algorithms by selectively emphasizing or de-emphasizing their updates on different time steps. In particular, we show that varying the emphasis of linear TD(γ)'s updates in a particular way causes its expected update to become stable under off-policy training. The only prior model-free TD methods to achieve this with per-step computation linear in the number of function approximation parameters are the gradient-TD family of methods including TDC, GTD(γ), and GQ(λ). Compared to these methods, our emphatic TD(λ) is simpler and easier to use; it has only one learned parameter vector and one step-size parameter. Our treatment includes general state-dependent discounting and bootstrapping functions, and a way of specifying varying degrees of interest in accurately valuing different states.
Acknowledgements
The authors thank Hado van Hasselt, Doina Precup, Huizhen Yu, Susan Murphy, and Brendan Bennett for insights and discussions contributing to the results presented in this paper, and the entire Reinforcement Learning and Artificial Intelligence research group for providing the environment to nurture and support this research. We gratefully acknowledge funding from Alberta Innovates – Technology Futures and from the Natural Sciences and Engineering Research Council of Canada.
Feb 1st 2023
Research Post
Read this research paper, co-authored by Fellow & Canada CIFAR AI Chair at Russ Greiner: Towards artificial intelligence-based learning health system for population-level mortality prediction using electrocardiograms
Jan 31st 2023
Research Post
Jan 20th 2023
Research Post
Looking to build AI capacity? Need a speaker at your event?