论文标题

图像文本匹配的两流层次相似性推理

Two-stream Hierarchical Similarity Reasoning for Image-text Matching

论文作者

Chen, Ran, Wang, Hanli, Wang, Lei, Kwong, Sam

论文摘要

基于推理的方法证明了它们在图像文本匹配的任务中的强大能力。在这项工作中,对于图像文本匹配,解决了两个问题。首先,对于推理处理,常规方法没有能力查找和使用多级分层相似性信息。为了解决此问题,提出了一个层次相似性推理模块来自动提取上下文信息,然后将其与本地交互信息共同开发以进行有效的推理。其次,以前的方法仅考虑学习单流相似性对齐(即图像到文本级别或文本形象级别),这是不足以充分使用相似性信息来进行图像文本匹配。为了解决此问题,开发了两个流式体系结构,将图像文本分解为图像到文本级别和文本对图像级别的相似度计算。这两个问题是通过以端到端进行训练的统一框架进行了研究的,即两流层次结构相似性推理网络。与现有最新方法相比,在MSCOCO和FLICKR30K的两个基准数据集上进行的广泛实验表明,该方法的优越性。

Reasoning-based approaches have demonstrated their powerful ability for the task of image-text matching. In this work, two issues are addressed for image-text matching. First, for reasoning processing, conventional approaches have no ability to find and use multi-level hierarchical similarity information. To solve this problem, a hierarchical similarity reasoning module is proposed to automatically extract context information, which is then co-exploited with local interaction information for efficient reasoning. Second, previous approaches only consider learning single-stream similarity alignment (i.e., image-to-text level or text-to-image level), which is inadequate to fully use similarity information for image-text matching. To address this issue, a two-stream architecture is developed to decompose image-text matching into image-to-text level and text-to-image level similarity computation. These two issues are investigated by a unifying framework that is trained in an end-to-end manner, namely two-stream hierarchical similarity reasoning network. The extensive experiments performed on the two benchmark datasets of MSCOCO and Flickr30K show the superiority of the proposed approach as compared to existing state-of-the-art methods.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源