May 9 marks the start of the 2022 International Conference on Autonomous Agents and Multiagent Systems (AAMAS) and Amii is proud to highlight the contributions of our researchers to this year's event.
AAMAS is one of the world's largest and most influential conferences on agents and multiagent systems. It began in 2002 as a merger between three conference: the International Conference on Autonomous Agents, the International Conference on Multiagent Systems, and the International Workshop on Agent Theories, Architectures, and Languages. This year's conference is being held virtually.
In addition to the papers, Amii Fellow & Canada CIFAR AI Chair Matt Taylor this year's AAMAS conference is serving as co-chair of this year's conference.
Have a look at this year's Amii conference papers:
Workshop Papers
Pavlovian Signalling with General Value Functions in Agent-Agent Temporal Decision Making
Andrew Butcher, Michael Johanson, Elnaz Davoodi, Dylan Brenneis, Leslie Acker, Adam Parker, Adam White, Joseph Modayil and Patrick Pilarski
Abstract: In this paper, we contribute a multi-faceted study into Pavlovian signalling—a process by which learned, temporally extended predictions made by one agent inform decision-making by another agent. In service of generating and receiving signals, humans and other animals are known to represent time, determine time since past events, predict the time until a future stimulus, and both recognize and generate patterns that unfold in time. We investigate how different temporal processes impact coordination and signalling between learning agents by introducing a partially observable decision-making domain we call the Frost Hollow. In this domain, a prediction learning agent and a reinforcement learning agent are coupled into a two-part decision-making system that works to acquire sparse reward while avoiding time-conditional hazards. We evaluate two domain variations: machine agents interacting in a seven-state linear walk, and human-machine interaction in a virtual-reality environment. As a main contribution, we establish Pavlovian signalling as a natural bridge between fixed signalling paradigms and fully adaptive communication learning between two agents. Our results showcase the speed of learning for Pavlovian signalling, the impact that different temporal representations do (and do not) have on agent-agent coordination, and how temporal aliasing impacts agent-agent and human-agent interactions differently. We further show how to computationally build this adaptive signalling process out of a fixed signalling process, characterized by fast continual prediction learning and minimal constraints on the nature of the agent receiving signals. Our results therefore suggest an actionable, constructivist path towards communication learning between reinforcement learning agents.
Andrew Butcher, Michael Johanson, Elnaz Davoodi, Dylan Brenneis, Leslie Acker, Adam Parker, Adam White, Joseph Modayil and Patrick Pilarski
Assessing Human Interaction in Virtual Reality With Continually Learning Prediction Agents Based on Reinforcement Learning Algorithms: A Pilot Study
Dylan J. A. Brenneis, Adam S. R. Parker, Michael Bradley Johanson, Andrew Butcher, Elnaz Davoodi, Leslie Acker, Matthew M. Botvinick, Joseph Modayil, Adam White and Patrick M. Pilarski
Abstract: Artificial intelligence systems increasingly involve continual learning to enable flexibility in general situations that are not encountered during system training. Human interaction with autonomous systems is broadly studied, but research has hitherto under-explored interactions that occur while the system is actively learning, and can noticeably change its behaviour in minutes. In this pilot study, we investigate how the interaction between a human and a continually learning prediction agent develops as the agent develops competency. Additionally, we compare two different agent architectures to assess how representational choices in agent design affect the human-agent interaction. We develop a virtual reality environment and a time-based prediction task wherein learned predictions from a reinforcement learning (RL) algorithm augment human predictions. We assess how a participant’s performance and behaviour in this task differs across agent types, using both quantitative and qualitative analyses. Our findings suggest that human trust of the system may be influenced by early interactions with the agent, and that trust in turn affects strategic behaviour, but limitations of the pilot study rule out any conclusive statement. We identify trust as a key feature of interaction to focus on when considering RL-based technologies, and make several recommendations for modification to this study in preparation for a larger-scale investigation. A video summary of this paper can be found at https://youtu.be/oVYJdnBqTwQ. Video summary of this paper:
Work-in-Progress: Multi-Teacher Curriculum Design for Sparse Reward Environments
Chaitanya Kharyal, Tanmay Sinha and Matthew Taylor
Abstract: While reinforcement learning agents have had many impressive successes, such agents can often face difficulty in sparse reward environments. Agents often face this difficulty in real-world tasks — it can take a long time before an agent stumbles upon a rare positive outcome without guidance. To combat this problem, we propose an technique that we call Adversarial Multi-Teacher Curriculum Design with Traces. This technique involves multiple independent teachers engaged in a game against a goal-conditioned student. The primary algorithmic novelty, relative to existing work, is engaging multiple teachers and using a behavior cloning loss. In addition, we also introduce a new sparse reward environment for simulated driving in PyBullet. Empirical results show the potential of our algorithm in this novel domain.
Methodical Advice Collection and Reuse in Deep Reinforcement Learning
Sahir, Ercüment İlhan, Srijita Das and Matthew Taylor
Abstract: Reinforcement learning (RL) has shown great success in solving many challenging tasks via use of deep neural networks. Although using deep learning for RL brings immense representational power, it also causes a well-known sample-inefficiency problem. This means that the algorithms are data-hungry and require millions of training samples to converge to an adequate policy. One way to combat this issue is to use action advising in a teacher-student framework, where a knowledgeable teacher provides action advice to help the student. This work considers how to better leverage uncertainties about when a student should ask for advice and if the student can model the teacher to ask for less advice. The student could decide to ask for advice when it is uncertain or when both it and its model of the teacher are uncertain. In addition to this investigation, this paper introduces a new method to compute uncertainty for a deep RL agent using a secondary neural network. Our empirical results show that using dual uncertainties to drive advice collection and reuse may improve learning performance across several Atari games.