Research Post
A standard metric used to measure the approximate optimality of policies in imperfect information games is exploitability, i.e. the performance of a policy against its worst-case opponent. However, exploitability is intractable to compute in large games as it requires a full traversal of the game tree to calculate a best response to the given policy. We introduce a new metric, approximate exploitability, that calculates an analogous metric using an approximate best response; the approximation is done by using search and reinforcement learning. This is a generalization of local best response, a domain specific evaluation metric used in poker. We provide empirical results for a specific instance of the method, demonstrating that our method converges to exploitability in the tabular and function approximation settings for small games. In large games, our method learns to exploit both strong and weak agents, learning to exploit an AlphaZero agent.
Feb 24th 2022
Research Post
Feb 1st 2022
Research Post
Read this research paper, co-authored by Amii Fellow and Canada CIFAR AI Chairs Neil Burch and Michael Bowling: Rethinking formal models of partially observable multiagent decision making
Dec 6th 2021
Research Post
Read this research paper, co-authored by Amii Fellow and Canada CIFAR AI Chairs Neil Burch and Micheal Bowling: Player of Games
Looking to build AI capacity? Need a speaker at your event?