性能提高从何而来？ - 关于图像文本检索的可重复性问题

论文标题

性能提高从何而来？ - 关于图像文本检索的可重复性问题

Where Does the Performance Improvement Come From? -- A Reproducibility Concern about Image-Text Retrieval

论文作者

Rao, Jun, Wang, Fei, Ding, Liang, Qi, Shuhan, Zhan, Yibing, Liu, Weifeng, Tao, Dacheng

论文摘要

本文旨在通过分析图像文本检索模型的可重复性来为信息检索社区提供对检索学习最新进展的一些思考。由于过去十年中多模式数据的增加，图像文本检索已稳步成为信息检索领域的主要研究方向。许多研究人员使用MS-Coco和FlickR30K等基准数据集训练和评估图像文本检索算法。过去的研究主要集中在绩效上，以多种方式提出了多种最先进的方法。根据他们的断言，这些技术提供了改进的模态相互作用，从而更精确的多模式表示。与以前的作品相反，我们专注于方法的可重复性以及对元素的检查，这些元素在检索图像和文本时通过验证和未经预处理的模型改善了性能。更具体地说，我们首先研究了相关的可重复性问题，并解释了为什么我们的重点是图像文本检索任务。其次，我们系统地总结了图像文本检索模型的当前范式以及这些方法的既定贡献。第三，我们分析了预审预测和未进行检索模型的复制的各个方面。为了完成这项工作，我们进行了消融实验，并获得了一些影响检索召回的影响因素，而不是原始论文中所主张的改进。最后，我们提出了检索社区将来应该考虑的一些思考和挑战。我们的源代码可在https://github.com/wangfei-2019/image-text-retrieval上公开获得。

This article aims to provide the information retrieval community with some reflections on recent advances in retrieval learning by analyzing the reproducibility of image-text retrieval models. Due to the increase of multimodal data over the last decade, image-text retrieval has steadily become a major research direction in the field of information retrieval. Numerous researchers train and evaluate image-text retrieval algorithms using benchmark datasets such as MS-COCO and Flickr30k. Research in the past has mostly focused on performance, with multiple state-of-the-art methodologies being suggested in a variety of ways. According to their assertions, these techniques provide improved modality interactions and hence more precise multimodal representations. In contrast to previous works, we focus on the reproducibility of the approaches and the examination of the elements that lead to improved performance by pretrained and nonpretrained models in retrieving images and text. To be more specific, we first examine the related reproducibility concerns and explain why our focus is on image-text retrieval tasks. Second, we systematically summarize the current paradigm of image-text retrieval models and the stated contributions of those approaches. Third, we analyze various aspects of the reproduction of pretrained and nonpretrained retrieval models. To complete this, we conducted ablation experiments and obtained some influencing factors that affect retrieval recall more than the improvement claimed in the original paper. Finally, we present some reflections and challenges that the retrieval community should consider in the future. Our source code is publicly available at https://github.com/WangFei-2019/Image-text-Retrieval.

下载PDF全文

下载文献需遵守相关版权规定

论文标题