论文标题
AVR:基于注意力的显着视觉关系检测
AVR: Attention based Salient Visual Relationship Detection
论文作者
论文摘要
视觉关系检测旨在将对象定位在图像中并识别对象之间的关系。传统方法在图像中平均处理所有观察到的关系,这会在具有丰富的视觉对象和各种关系的复杂图像上的检测任务中导致相对较差的性能。为了解决这个问题,我们提出了一个基于注意力的模型,即AVR,以基于关系的本地和全球背景来实现显着的视觉关系。具体而言,AVR通过融合了关系的视觉特征,语义和空间信息来识别关系的关注,并衡量了在输入图像的本地环境中的关注。然后,AVR将注意力应用于分配重要的关系,并具有更大的显着权重,以进行有效的信息过滤。此外,AVR与图像数据集的全局上下文中的先验知识集成在一起,以提高关系预测的精确度,在这种情况下,将上下文建模为异质图,以根据随机步行算法来衡量关系的先验概率。进行了全面的实验,以证明AVR在几个现实世界图像数据集中的有效性,结果表明,在召回率方面,AVR的表现优于最先进的视觉关系检测方法,高达$ 87.5 \%$。
Visual relationship detection aims to locate objects in images and recognize the relationships between objects. Traditional methods treat all observed relationships in an image equally, which causes a relatively poor performance in the detection tasks on complex images with abundant visual objects and various relationships. To address this problem, we propose an attention based model, namely AVR, to achieve salient visual relationships based on both local and global context of the relationships. Specifically, AVR recognizes relationships and measures the attention on the relationships in the local context of an input image by fusing the visual features, semantic and spatial information of the relationships. AVR then applies the attention to assign important relationships with larger salient weights for effective information filtering. Furthermore, AVR is integrated with the priori knowledge in the global context of image datasets to improve the precision of relationship prediction, where the context is modeled as a heterogeneous graph to measure the priori probability of relationships based on the random walk algorithm. Comprehensive experiments are conducted to demonstrate the effectiveness of AVR in several real-world image datasets, and the results show that AVR outperforms state-of-the-art visual relationship detection methods significantly by up to $87.5\%$ in terms of recall.