用U结构网络进行半监督的跨模式显着对象检测

论文标题

用U结构网络进行半监督的跨模式显着对象检测

Semi-Supervised Cross-Modal Salient Object Detection with U-Structure Networks

论文作者

Bao, Yunqing, Dai, Hang, Elsaddik, Abdulmotaleb

论文摘要

显着对象检测（SOD）是一个流行而重要的主题，旨在精确检测和分割图像中有趣的区域。我们将语言信息集成到专为显着对象检测任务的基于视觉的U结构网络中。这些实验基于新创建的DUTS Cross Modal（DUTS-CM）数据集，该数据集包含视觉和语言标签。我们提出了一个新的模块，称为有效的跨模式自我注意力（ECMSA），以结合视觉和语言特征并提高原始U结构网络的性能。同时，为了减轻标签的沉重负担，我们通过训练基于DUTS-CM数据集的图像标题模型来采用半监督的学习方法，该模型可以自动标记其他数据集（例如Dut-omron和HKU-IS）。综合实验表明，通过自然语言输入可以提高SOD的性能，并且与其他SOD方法相比具有竞争力。

Salient Object Detection (SOD) is a popular and important topic aimed at precise detection and segmentation of the interesting regions in the images. We integrate the linguistic information into the vision-based U-Structure networks designed for salient object detection tasks. The experiments are based on the newly created DUTS Cross Modal (DUTS-CM) dataset, which contains both visual and linguistic labels. We propose a new module called efficient Cross-Modal Self-Attention (eCMSA) to combine visual and linguistic features and improve the performance of the original U-structure networks. Meanwhile, to reduce the heavy burden of labeling, we employ a semi-supervised learning method by training an image caption model based on the DUTS-CM dataset, which can automatically label other datasets like DUT-OMRON and HKU-IS. The comprehensive experiments show that the performance of SOD can be improved with the natural language input and is competitive compared with other SOD methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题