News

Neural Predictive Regret Matching, behavioural model evaluation and more: Amii research at 2024 AAAI conference

This year's AAAI conference takes place in Vancouver from Feb. 20 - 27. (Credit: Adi K / Pexels)

Amii is excited to highlight the research that our scientists and students are presenting at the 38th Annual AAAI Conference on Artificial Intelligence. This year's conference is taking place in Vancouver from Feb. 21 - 24th.

AAAI 2024 is hosted by the Association for the Advancement of Artificial Intelligence and is one of the premier international events for AI researchers. The AAAI Conference covers a wide range of topics within AI, including machine learning, computer vision, natural language processing, robotics, and ethical considerations of AI technologies.

In addition to the main programming track, it features the Conference on Innovative Applications of Artificial Intelligence as well as tracks for AI on Social impact and Safe, Robust and Responsible AI.

Below are papers co-authored by Amii Fellows, Canada CIFAR AI Chairs and early-stage researchers that were accepted at AAAI 2024.

Learning Not to Regret

*David Sychrovský; Michal Šustr; Elnaz Davoodi; Michael Bowling; Marc Lanctot; Martin Schmid

Abstract:
The literature on game-theoretic equilibrium finding predominantly focuses on single games or their repeated play. Nevertheless, numerous real-world scenarios feature playing a game sampled from a distribution of similar, but not identical games, such as playing poker with different public cards or trading correlated assets on the stock market. As these similar games feature similar equilibra, we investigate a way to accelerate equilibrium finding on such a distribution. We present a novel "learning not to regret" framework, enabling us to meta-learn a regret minimizer tailored to a specific distribution. Our key contribution, Neural Predictive Regret Matching, is uniquely meta-learned to converge rapidly for the chosen distribution of games, while having regret minimization guarantees on any game. We validated our algorithms' faster convergence on a distribution of river poker games. Our experiments show that the meta-learned algorithms outpace their non-meta-learned counterparts, achieving more than tenfold improvements.

Federated Partial Label Learning with Local-Adaptive Augmentation and Regularization

Yan Yan; *Yuhong Guo

Abstract:
Not currently available.

Rectangle Search: An Anytime Beam Search

Sofia Lemons; Wheeler Ruml; *Rob Holte; Carlos Linares Lopez

Abstract:
Anytime heuristic search algorithms try to find a (potentially suboptimal) solution as quickly as possible and then work to find better and better solutions until an optimal solution is obtained or time is exhausted. The most widely-known anytime search algorithms are based on best-first search. In this paper, we propose a new algorithm, rectangle search, that is instead based on beam search, a variant of breadth-first search. It repeatedly explores alternatives at all depth levels and is thus best-suited to problems featuring deep local minima. Experiments using a variety of popular search benchmarks suggest that rectangle search is competitive with fixed-width beam search and often performs better than the previous best anytime search algorithms.

Responsible Bandit Learning via Privacy-Protected Mean-Volatility Utility

Shanshan Zhao; Wenhai Cui; *Bei Jiang; * Linglong Kong; Xiaodong Yan

Abstract:
Not currently available.

Analysis of Differentially Private Synthetic Data: A Measurement Error Approach

Yangdi Jiang; Yi Liu; Xiaodong Yan; Anne-Sophie Charest; *Linglong Kong; * Bei Jiang

Abstract:
Differential private (DP) synthetic datasets have been receiving significant attention from academia, industry, and government. However, little is known about how to perform statistical inference using DP synthetic datasets. Naive approaches that do not take into account the induced uncertainty due to DP mechanism will result in biased estimators and invalid inferences. In this paper, we present a general class of bias-corrected DP estimators with valid asymptotic confidence intervals for parameters in regression settings, by establishing the connection between additive DP mechanisms and measurement error models. Our simulation shows that when the sample covariance between DP noises and data is close to zero, our estimator is far superior to the widely used sufficient statistic perturbation algorithm, and the CIs can achieve better coverage when comparing to the naive CIs obtained from ignoring the DP mechanism.

Pay to (Not) Play: Monetizing Impatience in Mobile Games

Taylor Lundy; Narun Raman; Hu Fu; *Kevin Leyton-Brown

Abstract:
Mobile gaming is a rapidly growing and incredibly profitable sector; having grown seven-fold over the past 10 years, it now grosses over $100 billion annually. This growth was due in large part to a shift in monetization strategies: rather than charging players an upfront cost (“pay-to-play”), games often request optional microtransactions throughout gameplay (“free-to-play”). We focus on a common scenario in which games include wait times—gating either items or game progression—that players can pay to skip. Game designers typically say that they optimize for player happiness rather than revenue; however, prices for skips are typically set at levels that few players are willing to pay, leading to low purchase rates. Under a traditional analysis, it would seem that game designers fail at their stated goal if few players buy what they are selling. We argue that an alternate model can better explain this dynamic: players value tasks more highly as they are perceived to be more difficult. While skips can increase players’ utilities by providing instant gratification, pricing skips too cheaply can lower players’ utilities by decreasing the perceived amount of work needed to complete a task. We show that high revenue, high player utility, and low purchase rates can all coexist under this model, particularly under a realistic distribution of players having few buyers but a few big-spending “whales.” We also investigate how a game designer should optimize prices under our model.

How to Evaluate Behavioral Models

Greg d'Eon; Sophie Greenwood; *Kevin Leyton-Brown; *James R. Wright

Abstract:

Not currently available.

Monte Carlo Tree Search in the Presence of Transition Uncertainty

Farnaz Kohankhaki; Kiarash Aghakasiri; Hongming Zhang; Ting-Han Wei; Chao Gao; *Martin Müller

Abstract:

Monte Carlo Tree Search (MCTS) is an immensely popular search-based framework used for decision making. It is traditionally applied to domains where a perfect simulation model of the environment is available. We study and improve MCTS in the context where the environment model is given but imperfect. We show that the discrepancy between the model and the actual environment can lead to significant performance degradation with standard MCTS. We therefore develop Uncertainty Adapted MCTS (UA-MCTS), a more robust algorithm within the MCTS framework. We estimate the transition uncertainty in the given model, and direct the search towards more certain transitions in the state space. We modify all four MCTS phases to improve the search behavior by considering these estimates. We prove, in the corrupted bandit case, that adding uncertainty information to adapt UCB leads to tighter regret bound than standard UCB. Empirically, we evaluate UA-MCTS and its individual components on the deterministic domains from the MinAtar test suite. Our results demonstrate that UA-MCTS strongly improves MCTS in the presence of model transition errors.

PORTAL: Automatic Curricula Generation for Multiagent Reinforcement Learning

Jizhou Wu; Jianye Hao; Tianpei Yang; Xiaotian Hao; Yan Zheng; Weixun Wang; *Matthew E. Taylor

Abstract:
Despite many breakthroughs in recent years, it is still hard for MultiAgent Reinforcement Learning (MARL) algorithms to directly solve complex tasks in MultiAgent Systems (MASs) from scratch. In this work, we study how to use Automatic Curriculum Learning (ACL) to reduce the number of environmental interactions required to learn a good policy. In order to solve a difficult task, ACL methods automatically select a sequence of tasks (i.e., curricula). The idea is to obtain maximum learning progress towards the final task by continuously learning on tasks that match the current capabilities of the learners. The key question is how to measure the learning progress of the learner for better curriculum selection. We propose a novel ACL framework, PrOgRessive mulTiagent Automatic curricuLum (PORTAL), for MASs. PORTAL selects curricula according to two criteria: 1) How difficult is a task, relative to the learners' current abilities? 2) How similar is a task, relative to the final task? By learning a shared feature space between tasks, PORTAL is able to characterize different tasks based on the distribution of features and select those that are similar to the final task. Also, the shared feature space can effectively facilitate the policy transfer between curricula. Experimental results show that PORTAL can train agents to master extremely hard cooperative tasks, which cannot be achieved with previous state-of-the-art MARL algorithms.

A Transfer Approach Using Graph Neural Networks in Deep Reinforcement Learning

Tianpei Yang; Heng You; Jianye Hao; Yan Zheng; *Matthew E. Taylor

Abstract:
Currently not available.

Approximation Algorithms for Preference Aggregation Using CP-Nets

Abu Mohammad Hammad Ali; Boting Yang; *Sandra Zilles

Abstract:
This paper studies the design and analysis of approximation algorithms for aggregating preferences over combinatorial domains, represented using Conditional Preference Networks (CP-nets). Its focus is on aggregating preferences over so-called \emph{swaps}, for which optimal solutions in general are already known to be of exponential size. We first analyze a trivial 2-approximation algorithm that simply outputs the best of the given input preferences, and establish a structural condition under which the approximation ratio of this algorithm is improved to 4/3. We then propose a polynomial-time approximation algorithm whose outputs are provably no worse than those of the trivial algorithm, but often substantially better. A family of problem instances is presented for which our improved algorithm produces optimal solutions, while, for any ε, the trivial algorithm can\emph{not}\/ attain a (2−ε)-approximation. These results may lead to the first polynomial-time approximation algorithm that solves the CP-net aggregation problem for swaps with an approximation ratio substantially better than 2.

Want to stay up-to-date on the latest research from the Amii community? Sign up for our monthly newsletter!

Neural Predictive Regret Matching, behavioural model evaluation and more: Amii research at 2024 AAAI conference

Learning Not to Regret

Federated Partial Label Learning with Local-Adaptive Augmentation and Regularization

Rectangle Search: An Anytime Beam Search

Responsible Bandit Learning via Privacy-Protected Mean-Volatility Utility

Analysis of Differentially Private Synthetic Data: A Measurement Error Approach

Pay to (Not) Play: Monetizing Impatience in Mobile Games

How to Evaluate Behavioral Models

Monte Carlo Tree Search in the Presence of Transition Uncertainty

PORTAL: Automatic Curricula Generation for Multiagent Reinforcement Learning

A Transfer Approach Using Graph Neural Networks in Deep Reinforcement Learning

Approximation Algorithms for Preference Aggregation Using CP-Nets

Latest News Articles

AI for Good: Harnessing AI to Keep Online Spaces Safe for Indigenous Communities

Russ Greiner named recipient of the 2024 Brockhouse Canada Prize

Amii founding member Jonathan Schaeffer retiring from university after 40 years of game-changing AI

Let us help you

Connect with the community

Explore training and advanced education

Harness the potential of artificial intelligence

Connect with the community

Explore training and advanced education

Harness the potential of artificial intelligence