Deep RL for Index Policy in Restless Bandit Problems
I-Hong Hou, Associate Professor, ECE Department, Texas A&M University-
The University of Texas at Austin
Abstract: Whittle index policy is a powerful tool to obtain asymptotically optimal solutions for the notoriously intractable problem of restless bandits. However, finding the Whittle indices remains a difficult problem for many practical restless bandits with convoluted or unknown transition kernels. In this talk, we discuss employing deep reinforcement learning to train a neural network that predicts the Whittle indices. We leverage a fundamental property that the Whittle index can be viewed as an optimal control policy for a set of infinite MDPs. We further show that, despite the need to consider infinite MDPs, there is a simple expression of the policy gradient. Simulation results show that our algorithm learns the Whittle index much faster than several recent studies that learn the Whittle index through indirect means.
Bio: Dr. I-Hong Hou is an Associate Professor in the ECE Department of Texas A&M University. His research interests include cloud/edge computing, wireless networks, and machine learning. Dr. Hou received the B.S. in Electrical Engineering from National Taiwan University in 2004 and his M.S. and Ph.D. in Computer Science from the University of Illinois, Urbana-Champaign in 2008 and 2011, respectively. He received Best Paper Awards at ACM MobiHoc 2020 and ACM MobiHoc 2017, the Best Student Paper Award in WiOpt 2017, and the C.W. Gear Outstanding Graduate Student Award from the University of Illinois at Urbana-Champaign.Event Registration