在逆增强学习中使用结构图案进行奖励学习

论文标题

在逆增强学习中使用结构图案进行奖励学习

Reward Learning using Structural Motifs in Inverse Reinforcement Learning

论文作者

Saqur, Raeid

论文摘要

在过去的几年中，逆增强学习（\ textIt {irl}）问题已经迅速发展，在机器人技术，认知和健康等领域中具有重要的应用。在这项工作中，我们探讨了当前IRL方法从描述长马，复杂的顺序任务的专家轨迹中学习代理奖励功能的效率低下。我们假设，将IRL模型与捕获基本任务的结构图案相关的模型可以实现并提高其性能。随后，我们提出了一种新颖的IRL方法，即Smirl，该方法首先学习任务的（近似）结构为有限状态-Satate-automaton（FSA），然后使用结构基序来解决IRL问题。我们在离散网格世界和高维连续域环境上测试我们的模型。我们从经验上表明，我们提出的方法成功地学习了所有四个复杂的任务，其中两个基础IRL基准都失败了。我们的模型还表现出在更简单的玩具任务上的样本效率中的基准。我们进一步在具有组成奖励函数的任务的修改后连续域中显示了有希望的测试结果。

The Inverse Reinforcement Learning (\textit{IRL}) problem has seen rapid evolution in the past few years, with important applications in domains like robotics, cognition, and health. In this work, we explore the inefficacy of current IRL methods in learning an agent's reward function from expert trajectories depicting long-horizon, complex sequential tasks. We hypothesize that imbuing IRL models with structural motifs capturing underlying tasks can enable and enhance their performance. Subsequently, we propose a novel IRL method, SMIRL, that first learns the (approximate) structure of a task as a finite-state-automaton (FSA), then uses the structural motif to solve the IRL problem. We test our model on both discrete grid world and high-dimensional continuous domain environments. We empirically show that our proposed approach successfully learns all four complex tasks, where two foundational IRL baselines fail. Our model also outperforms the baselines in sample efficiency on a simpler toy task. We further show promising test results in a modified continuous domain on tasks with compositional reward functions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题