深度质量启发的功能操纵，可高效RGB-D和视频显着对象检测

论文标题

深度质量启发的功能操纵，可高效RGB-D和视频显着对象检测

Depth Quality-Inspired Feature Manipulation for Efficient RGB-D and Video Salient Object Detection

论文作者

Zhang, Wenbo, Fu, Keren, Wang, Zhuo, Ji, Ge-Peng, Zhao, Qijun

论文摘要

最近，基于CNN的RGB-D显着对象检测（SOD）在检测准确性方面已获得显着提高。但是，现有模型通常在效率和准确性方面表现良好。这阻碍了他们在移动设备上的潜在应用以及许多现实世界中的问题。在本文中，为了弥合RGB-D SOD的轻质和大型模型之间的准确性差距，这是一个有效的模块，可以极大地提高准确性，但提出了很少的计算。受深度质量是影响准确性的关键因素的启发，我们提出了有效的深度质量启发的功能操纵（DQFM）过程，该过程可以根据深度质量动态滤波深度特征。提出的DQFM求助于低级RGB和深度特征的对齐，以及深度流的整体关注，以明确控制和增强交叉模式融合。我们嵌入了DQFM，以获得一个称为DFM-NET的有效的轻质RGB-D SOD模型，此外，我们还设计了一个量身定制的深度骨架和两个阶段的解码器作为基本零件。九个RGB-D数据集的广泛实验结果表明，我们的DFM-NET优于最近的有效型号，在CPU上以约20 fps的速度运行，仅8.5mb型号尺寸，同时快2.9/2.4倍，比最新最佳型号A2DELE和MOSILESAL小于2.9/2.4倍，而6.7/3.1倍。与非效率模型相比，它还保持最先进的准确性。有趣的是，进一步的统计数据和分析验证了DQFM在没有任何质量标签的各种品质的深度图中的能力。最后但并非最不重要的一点是，我们进一步应用DFM-NET来处理视频SOD（VSOD），与最近的有效模型相比，相当的性能，而比该领域的先前最佳速度快/2.3倍/小3/2.3倍。我们的代码可在https://github.com/zwbx/dfm-net上找到。

Recently CNN-based RGB-D salient object detection (SOD) has obtained significant improvement on detection accuracy. However, existing models often fail to perform well in terms of efficiency and accuracy simultaneously. This hinders their potential applications on mobile devices as well as many real-world problems. To bridge the accuracy gap between lightweight and large models for RGB-D SOD, in this paper, an efficient module that can greatly improve the accuracy but adds little computation is proposed. Inspired by the fact that depth quality is a key factor influencing the accuracy, we propose an efficient depth quality-inspired feature manipulation (DQFM) process, which can dynamically filter depth features according to depth quality. The proposed DQFM resorts to the alignment of low-level RGB and depth features, as well as holistic attention of the depth stream to explicitly control and enhance cross-modal fusion. We embed DQFM to obtain an efficient lightweight RGB-D SOD model called DFM-Net, where we in addition design a tailored depth backbone and a two-stage decoder as basic parts. Extensive experimental results on nine RGB-D datasets demonstrate that our DFM-Net outperforms recent efficient models, running at about 20 FPS on CPU with only 8.5Mb model size, and meanwhile being 2.9/2.4 times faster and 6.7/3.1 times smaller than the latest best models A2dele and MobileSal. It also maintains state-of-the-art accuracy when even compared to non-efficient models. Interestingly, further statistics and analyses verify the ability of DQFM in distinguishing depth maps of various qualities without any quality labels. Last but not least, we further apply DFM-Net to deal with video SOD (VSOD), achieving comparable performance against recent efficient models while being 3/2.3 times faster/smaller than the prior best in this field. Our code is available at https://github.com/zwbx/DFM-Net.

下载PDF全文

下载文献需遵守相关版权规定

论文标题