Research Post

Asymptotically Optimal Information-Directed Sampling

Abstract

We introduce a simple and efficient algorithm for stochastic linear bandits with finitely many actions that is asymptotically optimal and worst-case rate optimal in finite time. The approach is based on the frequentist information-directed sampling (IDS) framework, with a surrogate for the information gain that is informed by the optimization problem that defines the asymptotic lower bound. Our analysis sheds light on how IDS balances the trade-off between regret and information. Moreover, we uncover a surprising connection between the recently proposed primal-dual methods and the Bayesian IDS algorithm. We demonstrate empirically that IDS is competitive with UCB in finite-time, and can be significantly better in the asymptotic regime.

Latest Research Papers

Connect with the community

Get involved in Alberta's growing AI ecosystem! Speaker, sponsorship, and letter of support requests welcome.

Explore training and advanced education

Curious about study options under one of our researchers? Want more information on training opportunities?

Harness the potential of artificial intelligence

Let us know about your goals and challenges for AI adoption in your business. Our Investments & Partnerships team will be in touch shortly!