Computationally Efficient Reinforcement Learning with Linear Bellman Completeness

Noah Golowich , PhD Student, MIT

12:15 - 1 pm

The University of Texas at Austin
Gates Dell Complex (GDC 6.302)
2317 Speedway
Austin, TX 78712
United States

Abstract: One of the most natural approaches to reinforcement learning (RL) with function approximation is value iteration, which inductively generates approximations to the optimal value function by solving a sequence of regression problems. To ensure the success of value iteration, it is typically assumed that Bellman completeness holds, which ensures that these regression problems are well-specified. I will discuss new results on the problem of learning an optimal policy under Bellman completeness in the online and offline models of RL with linear function approximation.

In the online setting, while statistically efficient algorithms are known under Bellman completeness, these algorithms all relied on the principle of global optimism which requires solving a nonconvex optimization problem. We give the first polynomial-time algorithm when the number of actions is any constant. In the offline setting, we give the first algorithm which finds a near-optimal policy under a single-policy coverage condition. Our algorithm is also computationally efficient and generalizes to the setting with small inherent Bellman error, for which the suboptimality of our output policy scales with the square root of the Bellman error. We also prove a matching lower bound, showing that no algorithm can achieve better than square-root dependence on the Bellman error. This lower bound stands in contrast to many other settings in reinforcement learning with misspecification, where one can typically obtain performance that degrades linearly with the misspecification error.

Speaker bio: Noah Golowich is a PhD Student at the Massachusetts Institute of Technology, advised by Constantinos Daskalakis and Ankur Moitra. His research interests lie in theoretical machine learning, with a particular focus on the connections between multi-agent learning, game theory, and online learning, and theoretical reinforcement learning. He was supported by a Fannie & John Hertz Foundation Fellowship and an NSF Graduate Fellowship.

Event Registration