克服探索：从时间逻辑规格中的混乱环境中连续控制的深度加固学习

论文标题

克服探索：从时间逻辑规格中的混乱环境中连续控制的深度加固学习

Overcoming Exploration: Deep Reinforcement Learning for Continuous Control in Cluttered Environments from Temporal Logic Specifications

论文作者

Cai, Mingyu, Aasi, Erfan, Belta, Calin, Vasile, Cristian-Ioan

论文摘要

使用深度加固学习（DRL）对机器人导航任务进行的无模型连续控制，依赖于探索的嘈杂政策对奖励的密度敏感。实际上，机器人通常部署在混乱的环境中，其中包含许多障碍和狭窄的通道。设计密集的有效奖励是具有挑战性的，从而导致培训期间探索问题。当使用时间逻辑规范描述任务时，此问题变得更加严重。这项工作提出了一种深层的策略梯度算法，用于控制在任务指定为线性时间逻辑（LTL）公式时在混乱环境中运行的未知动力学的机器人。为了克服培训期间的探索环境挑战，我们通过整合基于抽样的方法来有效地完成目标的任务，提出了一种新颖的路径计划引导奖励计划。为了促进LTL满意度，我们的方法将LTL任务分解为以分布式方式解决的次目标任务。我们的框架被证明可显着提高性能（有效性，效率）和对大规模混乱环境中复杂任务的机器人的探索。可以在YouTube频道上找到视频演示：https：//youtu.be/ymh_nunwxho。

Model-free continuous control for robot navigation tasks using Deep Reinforcement Learning (DRL) that relies on noisy policies for exploration is sensitive to the density of rewards. In practice, robots are usually deployed in cluttered environments, containing many obstacles and narrow passageways. Designing dense effective rewards is challenging, resulting in exploration issues during training. Such a problem becomes even more serious when tasks are described using temporal logic specifications. This work presents a deep policy gradient algorithm for controlling a robot with unknown dynamics operating in a cluttered environment when the task is specified as a Linear Temporal Logic (LTL) formula. To overcome the environmental challenge of exploration during training, we propose a novel path planning-guided reward scheme by integrating sampling-based methods to effectively complete goal-reaching missions. To facilitate LTL satisfaction, our approach decomposes the LTL mission into sub-goal-reaching tasks that are solved in a distributed manner. Our framework is shown to significantly improve performance (effectiveness, efficiency) and exploration of robots tasked with complex missions in large-scale cluttered environments. A video demonstration can be found on YouTube Channel: https://youtu.be/yMh_NUNWxho.

下载PDF全文

下载文献需遵守相关版权规定

论文标题