论文标题

通过多种方式理解空间关系

Understanding Spatial Relations through Multiple Modalities

论文作者

Dan, Soham, He, Hangfeng, Roth, Dan

论文摘要

在多种应用中,必须认识到空间关系和对它们的推理至关重要,包括导航,指示给予和人类计算机相互作用。物体之间的空间关系可以显式 - 以空间介词表示或隐式表示,由空间动词(例如移动,步行,移动等)表示。这两者都需要显着的常识理解。在本文中,我们介绍了图像中两个实体之间的隐式和显式空间关系的任务。我们设计了一个模型,该模型同时使用文本和视觉信息来预测空间关系,并同时使用对象和图像嵌入的位置和大小信息。我们将空间模型与强大的语言模型进行了对比,并展示了我们的建模如何补充这些模型,从而提高了预测准确性和覆盖范围,并促进了处理看不见的主题,对象和关系。

Recognizing spatial relations and reasoning about them is essential in multiple applications including navigation, direction giving and human-computer interaction in general. Spatial relations between objects can either be explicit -- expressed as spatial prepositions, or implicit -- expressed by spatial verbs such as moving, walking, shifting, etc. Both these, but implicit relations in particular, require significant common sense understanding. In this paper, we introduce the task of inferring implicit and explicit spatial relations between two entities in an image. We design a model that uses both textual and visual information to predict the spatial relations, making use of both positional and size information of objects and image embeddings. We contrast our spatial model with powerful language models and show how our modeling complements the power of these, improving prediction accuracy and coverage and facilitates dealing with unseen subjects, objects and relations.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源