State Abstraction for Programmable Reinforcement Learning Agents
Abstract: Safe state abstraction in reinforcement learning allows an agent to ignore aspects of its current state that are irrelevant to its current decision, and therefore speeds up dynamic programming and learning. Like Dietterich's MAXQ framework, this paper develops methods for safe state abstraction in the context of hierarchical reinforcement learning, in which a hierarchical partial program is used to constrain the policies that are considered. We extend techniques from MAXQ to the context of programmable hierarchical abstract machines (PHAMs), which express complex parameterized behaviors using a simple extension of the Lisp language. We show that our methods preserve the property of hierarchical optimality, i.e., optimality among all policies consistent with the PHAM program. We also show how our methods allow safe detachment, encapsulation, and transfer of learned "subroutine" behaviors, and demonstrate our methods on Dietterich's taxi domain.