从视频中学习对象的永久性

论文标题

从视频中学习对象的永久性

Learning Object Permanence from Video

论文作者

Shamsian, Aviv, Kleinfeld, Ofri, Globerson, Amir, Chechik, Gal

论文摘要

对象永久性允许人们通过理解即使不直接感知到它们仍然存在，可以推理不可识别的对象的位置。对象永久性对于建立世界模型至关重要，因为自然视觉场景中的对象动态遮挡并包含彼此。在发育心理学方面的密集研究表明，对象永久性是一项艰巨的任务，是通过丰富的经验来学习的。在这里，我们介绍了从数据中学习对象永久性的设置。我们解释了为什么应该将这个学习问题分解为四个组件，其中对象是（1）可见的，（2）遮挡，（3）由另一个对象包含，（4）由包含对象携带。第四个子任务是由包含对象携带的目标对象特别具有挑战性，因为它需要系统来推理无形对象的移动位置。然后，我们提出了一个统一的深度体系结构，该体系结构学会在这四种情况下预测对象位置。我们根据Cater评估了新数据集上的体系结构和系统，并发现它的表现优于先前的本地化方法和各种基准。

Object Permanence allows people to reason about the location of non-visible objects, by understanding that they continue to exist even when not perceived directly. Object Permanence is critical for building a model of the world, since objects in natural visual scenes dynamically occlude and contain each-other. Intensive studies in developmental psychology suggest that object permanence is a challenging task that is learned through extensive experience. Here we introduce the setup of learning Object Permanence from data. We explain why this learning problem should be dissected into four components, where objects are (1) visible, (2) occluded, (3) contained by another object and (4) carried by a containing object. The fourth subtask, where a target object is carried by a containing object, is particularly challenging because it requires a system to reason about a moving location of an invisible object. We then present a unified deep architecture that learns to predict object location under these four scenarios. We evaluate the architecture and system on a new dataset based on CATER, and find that it outperforms previous localization methods and various baselines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题