视觉导航的离线增强学习

论文标题

视觉导航的离线增强学习

Offline Reinforcement Learning for Visual Navigation

论文作者

Shah, Dhruv, Bhorkar, Arjun, Leen, Hrish, Kostrikov, Ilya, Rhinehart, Nick, Levine, Sergey

论文摘要

强化学习可以使机器人能够导航到遥远的目标，同时优化用户指定的奖励功能，包括偏爱以下车道，铺在铺路路径上或避免新鲜割草的草。但是，在现实世界中的试用器上进行的在线学习在逻辑上是具有挑战性的，而可以利用机器人导航数据的现有数据集的方法可以显着扩展并实现更广泛的概括。在本文中，我们介绍了Revind，这是第一个用于机器人导航的离线RL系统，它可以利用先前收集的数据来优化现实世界中用户指定的奖励功能。我们在没有任何其他数据收集或微调的情况下评估了我们的越野导航系统，并证明它可以仅使用该数据集中的离线培训导航到遥远的目标，并根据用户指定的奖励功能进行质量差异的行为。

Reinforcement learning can enable robots to navigate to distant goals while optimizing user-specified reward functions, including preferences for following lanes, staying on paved paths, or avoiding freshly mowed grass. However, online learning from trial-and-error for real-world robots is logistically challenging, and methods that instead can utilize existing datasets of robotic navigation data could be significantly more scalable and enable broader generalization. In this paper, we present ReViND, the first offline RL system for robotic navigation that can leverage previously collected data to optimize user-specified reward functions in the real-world. We evaluate our system for off-road navigation without any additional data collection or fine-tuning, and show that it can navigate to distant goals using only offline training from this dataset, and exhibit behaviors that qualitatively differ based on the user-specified reward function.

下载PDF全文

下载文献需遵守相关版权规定

论文标题