查看，听到和感觉：机器人操纵的智能感官融合

论文标题

查看，听到和感觉：机器人操纵的智能感官融合

See, Hear, and Feel: Smart Sensory Fusion for Robotic Manipulation

论文作者

Li, Hao, Zhang, Yizhi, Zhu, Junzhe, Wang, Shaoxiong, Lee, Michelle A, Xu, Huazhe, Adelson, Edward, Fei-Fei, Li, Gao, Ruohan, Wu, Jiajun

论文摘要

人类利用自己的所有感官在日常活动中完成不同的任务。相比之下，机器人操作的现有工作主要依赖于一种或偶尔的两种方式，例如视觉和触觉。在这项工作中，我们系统地研究了视觉，听觉和触觉感知如何共同帮助机器人解决复杂的操纵任务。我们构建了一个机器人系统，该机器人系统可以与摄像头看到，用触点麦克风听到，并使用基于视觉的触觉传感器感觉，所有三种感官方式都与自我发场模型融合在一起。结果是两项具有挑战性的任务，密集的包装和倾泻，证明了机器人操作的多感官感知的必要性和力量：视觉显示机器人的全球状态，但可能会遭受阻塞的全球状态，音频立即提供了无形的关键时刻的反馈，并且触摸提供了精确的本地阶段，以进行本地阶段来制定决策。利用这三种方式，我们的机器人系统明显优于先验方法。

Humans use all of their senses to accomplish different tasks in everyday activities. In contrast, existing work on robotic manipulation mostly relies on one, or occasionally two modalities, such as vision and touch. In this work, we systematically study how visual, auditory, and tactile perception can jointly help robots to solve complex manipulation tasks. We build a robot system that can see with a camera, hear with a contact microphone, and feel with a vision-based tactile sensor, with all three sensory modalities fused with a self-attention model. Results on two challenging tasks, dense packing and pouring, demonstrate the necessity and power of multisensory perception for robotic manipulation: vision displays the global status of the robot but can often suffer from occlusion, audio provides immediate feedback of key moments that are even invisible, and touch offers precise local geometry for decision making. Leveraging all three modalities, our robotic system significantly outperforms prior methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题