A Reinforcement Learning Algorithm with Polynomial Interaction Complexity for Only-Costly-Observable MDPs
Roy Fox, and Moshe Tennenholtz
22nd Conference on Artificial Intelligence (AAAI), 2007
An Unobservable MDP (UMDP) is a POMDP in which there are no observations. An Only-Costly-Observable MDP (OCOMDP) is a POMDP which extends an UMDP by allowing a particular costly action which completely observes the state. We introduce UR-MAX, a reinforcement learning algorithm with polynomial interaction complexity for unknown OCOMDPs.