Exponential Lower Bounds for Planning in MDPs With Linearly-Realizable Optimal Action-Value Functions

Abstract

We consider the problem of local planning in fixed-horizon and discounted Markov Decision Processes (MDPs) with linear function approximation and a generative model under the assumption that the optimal action-value function lies in the span of a feature map that is available to the planner. Previous work has left open the question of whether there exist sound planners that need only poly(H,d) queries regardless of the MDP, where H is the horizon and d is the dimensionality of the features. We answer this question in the negative: we show that any sound planner must query at least min(exp(Ω(d)),Ω(2H)) samples in the fized-horizon setting and exp(Ω(d)) samples in the discounted setting. We also show that for any δ>0, the least-squares value iteration algorithm with O(H5dH+1/δ2) queries can compute a δ-optimal policy in the fixed-horizon setting. We discuss implications and remaining open questions.

Exponential Lower Bounds for Planning in MDPs With Linearly-Realizable Optimal Action-Value Functions

Abstract

Latest Research Papers

UCTransNet: Rethinking the Skip Connections in U-Net from a Channel-Wise Perspective with Transformer

Habitat-Matterport 3D Dataset (HM3D): 1000 Large-scale 3D Environments for Embodied AI

Roominoes: Generating Novel 3D Floor Plans From Existing 3D Rooms

Let us help you

Connect with the community

Explore training and advanced education

Harness the potential of artificial intelligence

Connect with the community

Explore training and advanced education

Harness the potential of artificial intelligence