ConQUR: Mitigating Delusional Bias in Deep Q-learning

Abstract

Delusional bias is a fundamental source of error in approximate Q-learning. To date, the only techniques that explicitly address delusion require comprehensive search using tabular value estimates. In this paper, we develop efficient methods to mitigate delusional bias by training Q-approximators with labels that are "consistent" with the underlying greedy policy class. We introduce a simple penalization scheme that encourages Q-labels used across training batches to remain (jointly) consistent with the expressible policy class. We also propose a search framework that allows multiple Q-approximators to be generated and tracked, thus mitigating the effect of premature (implicit) policy commitments. Experimental results demonstrate that these methods can improve the performance of Q-learning in a variety of Atari games, sometimes dramatically.

ConQUR: Mitigating Delusional Bias in Deep Q-learning

Abstract

Latest Research Papers

Learning Expected Emphatic Traces for Deep RL

Toward Observation Based Least Restrictive Collision Avoidance Using Deep Meta Reinforcement Learning

RNNRepair: Automatic RNN Repair via Model-based Analysis

Let us help you

Connect with the community

Explore training and advanced education

Harness the potential of artificial intelligence

Connect with the community

Explore training and advanced education

Harness the potential of artificial intelligence