随机路径计划的多任务选项学习和发现

论文标题

随机路径计划的多任务选项学习和发现

Multi-Task Option Learning and Discovery for Stochastic Path Planning

论文作者

Shah, Naman, Srivastava, Siddharth

论文摘要

本文解决了可靠，有效地解决长期远程随机路径计划问题的问题。从具有随机动力学模拟器和环境的占用矩阵的香草RL公式开始，我们的方法使用策略以及组成发现选项的高级路径计算有用的选项。我们的主要贡献是（1）创建抽象状态的数据驱动方法，该方法用作有用的选项的终点，（2）使用自动生成的选项指南计算选项策略的方法，以密集的伪奖励函数的形式以及（3）总体算法组成计算选项。我们表明，这种方法可实现可执行性和可溶性的强烈保证：在相当笼统的条件下，计算的期权指南导致可组合期权策略，因此确保了向下的可再启动性。对一系列机器人，环境和任务的经验评估表明，这种方法有效地传递了跨相关任务的知识，并且它的表现优于现有方法的大幅度。

This paper addresses the problem of reliably and efficiently solving broad classes of long-horizon stochastic path planning problems. Starting with a vanilla RL formulation with a stochastic dynamics simulator and an occupancy matrix of the environment, our approach computes useful options with policies as well as high-level paths that compose the discovered options. Our main contributions are (1) data-driven methods for creating abstract states that serve as endpoints for helpful options, (2) methods for computing option policies using auto-generated option guides in the form of dense pseudo-reward functions, and (3) an overarching algorithm for composing the computed options. We show that this approach yields strong guarantees of executability and solvability: under fairly general conditions, the computed option guides lead to composable option policies and consequently ensure downward refinability. Empirical evaluation on a range of robots, environments, and tasks shows that this approach effectively transfers knowledge across related tasks and that it outperforms existing approaches by a significant margin.

下载PDF全文

下载文献需遵守相关版权规定

论文标题