论文标题
用U结构网络进行半监督的跨模式显着对象检测
Semi-Supervised Cross-Modal Salient Object Detection with U-Structure Networks
论文作者
论文摘要
显着对象检测(SOD)是一个流行而重要的主题,旨在精确检测和分割图像中有趣的区域。我们将语言信息集成到专为显着对象检测任务的基于视觉的U结构网络中。这些实验基于新创建的DUTS Cross Modal(DUTS-CM)数据集,该数据集包含视觉和语言标签。我们提出了一个新的模块,称为有效的跨模式自我注意力(ECMSA),以结合视觉和语言特征并提高原始U结构网络的性能。同时,为了减轻标签的沉重负担,我们通过训练基于DUTS-CM数据集的图像标题模型来采用半监督的学习方法,该模型可以自动标记其他数据集(例如Dut-omron和HKU-IS)。综合实验表明,通过自然语言输入可以提高SOD的性能,并且与其他SOD方法相比具有竞争力。
Salient Object Detection (SOD) is a popular and important topic aimed at precise detection and segmentation of the interesting regions in the images. We integrate the linguistic information into the vision-based U-Structure networks designed for salient object detection tasks. The experiments are based on the newly created DUTS Cross Modal (DUTS-CM) dataset, which contains both visual and linguistic labels. We propose a new module called efficient Cross-Modal Self-Attention (eCMSA) to combine visual and linguistic features and improve the performance of the original U-structure networks. Meanwhile, to reduce the heavy burden of labeling, we employ a semi-supervised learning method by training an image caption model based on the DUTS-CM dataset, which can automatically label other datasets like DUT-OMRON and HKU-IS. The comprehensive experiments show that the performance of SOD can be improved with the natural language input and is competitive compared with other SOD methods.