通过迭代反转学习控制

论文标题

通过迭代反转学习控制

Learning Control by Iterative Inversion

论文作者

Leibovich, Gal, Jacob, Guy, Avner, Or, Novik, Gal, Tamar, Aviv

论文摘要

我们建议$ \ textIt {迭代反转} $ - 一种用于学习无输入输出对的逆函数的算法，但仅使用来自所需的输出分布和访问向前功能的样本。关键的挑战是在所需的输出和初始随机猜测的输出之间$ \ textit {Distribution shift} $，我们证明迭代反转可以在功能上相当严格的条件下正确地引导学习。我们应用迭代反转来学习控制。我们的输入是一组所需行为的演示，作为轨迹的视频嵌入（无动作），我们的方法迭代地学习模仿当前策略生成的轨迹，并受到随机探索噪声的扰动。我们的方法不需要奖励，只需要采用监督的学习，可以轻松地扩展使用最新的轨迹嵌入技术和政策表示。实际上，借助VQ-VAE嵌入和基于变压器的政策，我们证明了对多个任务的非平凡连续控制。此外，与基于奖励的方法相比，我们报告了模仿各种行为的表现。

We propose $\textit{iterative inversion}$ -- an algorithm for learning an inverse function without input-output pairs, but only with samples from the desired output distribution and access to the forward function. The key challenge is a $\textit{distribution shift}$ between the desired outputs and the outputs of an initial random guess, and we prove that iterative inversion can steer the learning correctly, under rather strict conditions on the function. We apply iterative inversion to learn control. Our input is a set of demonstrations of desired behavior, given as video embeddings of trajectories (without actions), and our method iteratively learns to imitate trajectories generated by the current policy, perturbed by random exploration noise. Our approach does not require rewards, and only employs supervised learning, which can be easily scaled to use state-of-the-art trajectory embedding techniques and policy representations. Indeed, with a VQ-VAE embedding, and a transformer-based policy, we demonstrate non-trivial continuous control on several tasks. Further, we report an improved performance on imitating diverse behaviors compared to reward based methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题