迈向任务优先的政策组成

论文标题

迈向任务优先的政策组成

Towards Task-Prioritized Policy Composition

论文作者

Rietz, Finn, Schaffernicht, Erik, Stoyanov, Todor, Stork, Johannes A.

论文摘要

希望以优先的，有序的方式相结合，因为它允许模块化设计并通过知识传输来促进数据重用。在控制理论中，优先的组合物是通过空空间控制实现的，其中低优先级控制动作被投影到高优先级控制动作的零空间中。这种方法目前无法用于加强学习。我们为增强学习提出了一个新颖的，任务优先的组成框架，其中涉及一个新颖的概念：强化学习政策的冷漠空间。我们的框架有可能促进知识转移和模块化设计，同时大大提高数据效率和增强学习代理的数据重用。此外，我们的方法可以确保高优先级的限制满意度，这使得在机器人技术等安全 - 关键领域学习方面有望。与零空间的控制不同，我们的方法允许通过在最初的复合策略构建后在高级政策的无差异空间中在线学习来学习复合任务的全球最佳策略。

Combining learned policies in a prioritized, ordered manner is desirable because it allows for modular design and facilitates data reuse through knowledge transfer. In control theory, prioritized composition is realized by null-space control, where low-priority control actions are projected into the null-space of high-priority control actions. Such a method is currently unavailable for Reinforcement Learning. We propose a novel, task-prioritized composition framework for Reinforcement Learning, which involves a novel concept: The indifferent-space of Reinforcement Learning policies. Our framework has the potential to facilitate knowledge transfer and modular design while greatly increasing data efficiency and data reuse for Reinforcement Learning agents. Further, our approach can ensure high-priority constraint satisfaction, which makes it promising for learning in safety-critical domains like robotics. Unlike null-space control, our approach allows learning globally optimal policies for the compound task by online learning in the indifference-space of higher-level policies after initial compound policy construction.

下载PDF全文

下载文献需遵守相关版权规定

论文标题