字幕生成基于无监督的动作学习的机器人行为

论文标题

字幕生成基于无监督的动作学习的机器人行为

Caption Generation of Robot Behaviors based on Unsupervised Learning of Action Segments

论文作者

Yoshino, Koichiro, Wakimoto, Kohei, Nishimura, Yuta, Nakamura, Satoshi

论文摘要

桥接机器人动作序列及其自然语言标题是提高人类辅助机器人在最近不断发展的领域的解释性的重要任务。在本文中，我们提出了一个用于生成自然语言标题的系统，以描述人类协助机器人的行为。该系统通过使用机器人观测来描述机器人动作。执行器系统和相机的历史，朝着机器人动作和自然语言字幕之间的端到端桥接。两个原因使得将现有序列到序列模型应用于此映射是一个挑战：1）很难为任何类型的机器人及其环境准备大规模数据集，以及2）从机器人动作观测值获得的样本数量和生成的单词序列之间存在差距。我们基于K-均值聚类引入了无监督的分割，以将典型的机器人观察模式统一为类。这种方法使网络可以从少量数据中学习关系。此外，我们利用了基于字节对编码（BPE）的块方法来填补机器人动作观察的样本数量与字幕中的单词之间的差距。我们还将注意机制应用于分割任务。实验结果表明，基于无监督学习的建议模型比其他方法可以产生更好的描述。我们还表明，在我们的低资源环境中，注意力机制不能很好地工作。

Bridging robot action sequences and their natural language captions is an important task to increase explainability of human assisting robots in their recently evolving field. In this paper, we propose a system for generating natural language captions that describe behaviors of human assisting robots. The system describes robot actions by using robot observations; histories from actuator systems and cameras, toward end-to-end bridging between robot actions and natural language captions. Two reasons make it challenging to apply existing sequence-to-sequence models to this mapping: 1) it is hard to prepare a large-scale dataset for any kind of robots and their environment, and 2) there is a gap between the number of samples obtained from robot action observations and generated word sequences of captions. We introduced unsupervised segmentation based on K-means clustering to unify typical robot observation patterns into a class. This method makes it possible for the network to learn the relationship from a small amount of data. Moreover, we utilized a chunking method based on byte-pair encoding (BPE) to fill in the gap between the number of samples of robot action observations and words in a caption. We also applied an attention mechanism to the segmentation task. Experimental results show that the proposed model based on unsupervised learning can generate better descriptions than other methods. We also show that the attention mechanism did not work well in our low-resource setting.

下载PDF全文

下载文献需遵守相关版权规定

论文标题