可解释的机器人系统：在强化学习方案中了解目标驱动的动作

论文标题

可解释的机器人系统：在强化学习方案中了解目标驱动的动作

Explainable robotic systems: Understanding goal-driven actions in a reinforcement learning scenario

论文作者

Cruz, Francisco, Dazeley, Richard, Vamplew, Peter, Moreira, Ithan

论文摘要

机器人系统每天都在我们的社会中存在。在人类机器人的环境中，至关重要的是，最终用户可以正确理解其机器人团队合作伙伴，以便完成任务。为了提高行动理解，用户要求在特定情况下对机器人的决策有更多的解释性。最近，可解释的机器人系统已成为一种替代方案，不仅是令人满意地完成任务，而且还以类似人类的方式来证明导致做出决定的原因。在强化学习方案中，一项巨大的努力一直集中在使用数据驱动的方法提供解释，尤其是从深度学习系统中的视觉输入方式中。在这项工作中，我们宁愿着重于在机器人场景中执行任务的强化学习推动者的决策过程。实验结果是使用3种不同的设置获得的，即确定性导航任务，随机导航任务和一个连续的基于视觉的分类对象任务。为了解释目标驱动机器人的动作的一种方式，我们使用了通过三种不同建议的方法计算出的成功概率：基于内存，基于学习和基于内省的方法。这些方法之间的差异是计算或估计成功概率所需的记忆量以及可以使用它们的强化学习表示形式。在这方面，我们将基于内存的方法用作基线，因为它是从代理的观察结果中直接获得的。当比较基于学习的基线方法和基于内省的方法时，两者都是计算成功概率的合适替代方法，使用Pearson的相关性和平均平方误差进行比较时获得了高水平的相似性。

Robotic systems are more present in our society everyday. In human-robot environments, it is crucial that end-users may correctly understand their robotic team-partners, in order to collaboratively complete a task. To increase action understanding, users demand more explainability about the decisions by the robot in particular situations. Recently, explainable robotic systems have emerged as an alternative focused not only on completing a task satisfactorily, but also on justifying, in a human-like manner, the reasons that lead to making a decision. In reinforcement learning scenarios, a great effort has been focused on providing explanations using data-driven approaches, particularly from the visual input modality in deep learning-based systems. In this work, we focus rather on the decision-making process of reinforcement learning agents performing a task in a robotic scenario. Experimental results are obtained using 3 different set-ups, namely, a deterministic navigation task, a stochastic navigation task, and a continuous visual-based sorting object task. As a way to explain the goal-driven robot's actions, we use the probability of success computed by three different proposed approaches: memory-based, learning-based, and introspection-based. The difference between these approaches is the amount of memory required to compute or estimate the probability of success as well as the kind of reinforcement learning representation where they could be used. In this regard, we use the memory-based approach as a baseline since it is obtained directly from the agent's observations. When comparing the learning-based and the introspection-based approaches to this baseline, both are found to be suitable alternatives to compute the probability of success, obtaining high levels of similarity when compared using both the Pearson's correlation and the mean squared error.

下载PDF全文

下载文献需遵守相关版权规定

论文标题