Obtaining Approximately Admissible Heuristic Functions through Deep Reinforcement Learning and A* Search

Forest Agostinelli, Stephen McAleer, Alexander Shmakov, Roy Fox, Marco Valtorta, Biplav Srivastava, and Pierre Baldi

Bridging the Gap between AI Planning and Reinforcement Learning workshop (PRL @ ICAPS), 2021

Deep reinforcement learning has been shown to be able to train deep neural networks to implement effective heuristic functions that can be used with A* search to solve problems with large state spaces. However, these learned heuristic functions are not guaranteed to be admissible. We introduce approximately admissible conversion, an algorithm that can convert any inadmissible heuristic function into a heuristic function that is admissible in the vast majority of cases with no domain-specific heuristic information. We apply approximately admissible conversion to heuristic functions parameterized by deep neural networks and show that these heuristic functions can be used to find optimal solutions, or bounded suboptimal solutions, even when doing a batched version of A* search. We test our method on the 15-puzzle and 24-puzzle and obtain a heuristic function that is empirically admissible over 99.99% of the time and that finds optimal solutions for 100% of all test configurations. To the best of our knowledge, this is the first demonstration that approximately admissible heuristics can be obtained using deep neural networks in a domain independent fashion.