论文标题
遥感大规模跨模式图像检索的深度无监督对比度
Deep Unsupervised Contrastive Hashing for Large-Scale Cross-Modal Text-Image Retrieval in Remote Sensing
论文作者
论文摘要
由于可用的大规模多模式数据(例如,由不同的传感器,文本句子等获取的卫星图像)档案的开发,可以开发跨模式检索系统,这些系统可以基于任何模式的查询在不同模式中搜索和检索不同模式的语义相关数据,这引起了RS的极大关注。在本文中,我们将注意力集中在跨模式的文本图像检索上,其中一种模式(例如,文本)的查询可以匹配到另一个模式(例如,图像)。大多数现有的跨模式文本图像检索系统都需要大量标记的训练样本,并且由于其内在特征,也不允许快速和记忆效率的检索。这些问题限制了现有的跨模式检索系统在Rs中的大规模应用中的适用性。为了解决这个问题,在本文中,我们引入了一种新颖的无监督的跨模式对比度哈希(Duch)方法,用于RS文本图像检索。提出的杜切由两个主要模块组成:1)特征提取模块(提取文本图像模态的深度表示); 2)哈希模块(该模块学会从提取的表示形式中生成跨模式二进制哈希码)。在哈希模块中,我们引入了一种新型的多目标损耗函数,包括:i)对比目标,可以在内部和模式间相似性中保持相似性; ii)对跨模式表示一致性的两种方式强制执行的对抗目标; iii)生成代表性哈希代码的二进制目标。实验结果表明,在两个多模式(图像和文本)基准的基准档案中,提出的Duch优于最先进的跨模式哈希方法。我们的代码可在https://git.tu-berlin.de/rsim/duch上公开获取。
Due to the availability of large-scale multi-modal data (e.g., satellite images acquired by different sensors, text sentences, etc) archives, the development of cross-modal retrieval systems that can search and retrieve semantically relevant data across different modalities based on a query in any modality has attracted great attention in RS. In this paper, we focus our attention on cross-modal text-image retrieval, where queries from one modality (e.g., text) can be matched to archive entries from another (e.g., image). Most of the existing cross-modal text-image retrieval systems require a high number of labeled training samples and also do not allow fast and memory-efficient retrieval due to their intrinsic characteristics. These issues limit the applicability of the existing cross-modal retrieval systems for large-scale applications in RS. To address this problem, in this paper we introduce a novel deep unsupervised cross-modal contrastive hashing (DUCH) method for RS text-image retrieval. The proposed DUCH is made up of two main modules: 1) feature extraction module (which extracts deep representations of the text-image modalities); and 2) hashing module (which learns to generate cross-modal binary hash codes from the extracted representations). Within the hashing module, we introduce a novel multi-objective loss function including: i) contrastive objectives that enable similarity preservation in both intra- and inter-modal similarities; ii) an adversarial objective that is enforced across two modalities for cross-modal representation consistency; iii) binarization objectives for generating representative hash codes. Experimental results show that the proposed DUCH outperforms state-of-the-art unsupervised cross-modal hashing methods on two multi-modal (image and text) benchmark archives in RS. Our code is publicly available at https://git.tu-berlin.de/rsim/duch.