论文标题
用时间逻辑规格详细阐述学习的演示
Elaborating on Learned Demonstrations with Temporal Logic Specifications
论文作者
论文摘要
从演示中学习的大多数当前方法都认为,仅这些演示就足以学习基本任务。这通常是不正确的,尤其是如果存在原始示范中不存在的额外安全规范。在本文中,我们允许专家使用线性时间逻辑(LTL)详细说明其原始演示,并使用其他规范信息。我们的系统将LTL规格转换为可区分的损失。然后,这种损失用于学习满足基础规范的动态运动原始性,同时保持接近原始演示。此外,通过利用对抗性训练,我们的系统学会了可靠地满足看不见的输入的给定LTL规范,而不仅仅是在培训中看到的。我们表明,我们的方法足够表达,可以跨越各种常见的运动规范模式,例如避免障碍,巡逻,保持稳定和速度限制。此外,我们表明我们的系统可以通过逐步构成多个简单规格来修改具有复杂规格的基本演示。我们还在PR-2机器人上实施了系统,以展示演示者如何从初始(次优)演示开始,然后通过在我们可区分的LTL损失中加以实施的其他规格来进行交互改善任务成功。
Most current methods for learning from demonstrations assume that those demonstrations alone are sufficient to learn the underlying task. This is often untrue, especially if extra safety specifications exist which were not present in the original demonstrations. In this paper, we allow an expert to elaborate on their original demonstration with additional specification information using linear temporal logic (LTL). Our system converts LTL specifications into a differentiable loss. This loss is then used to learn a dynamic movement primitive that satisfies the underlying specification, while remaining close to the original demonstration. Further, by leveraging adversarial training, our system learns to robustly satisfy the given LTL specification on unseen inputs, not just those seen in training. We show that our method is expressive enough to work across a variety of common movement specification patterns such as obstacle avoidance, patrolling, keeping steady, and speed limitation. In addition, we show that our system can modify a base demonstration with complex specifications by incrementally composing multiple simpler specifications. We also implement our system on a PR-2 robot to show how a demonstrator can start with an initial (sub-optimal) demonstration, then interactively improve task success by including additional specifications enforced with our differentiable LTL loss.