通过时空知识蒸馏快速视频显着对象检测

论文标题

通过时空知识蒸馏快速视频显着对象检测

Fast Video Salient Object Detection via Spatiotemporal Knowledge Distillation

论文作者

Tang, Yi, Li, Yuanman, Zou, Wenbin

论文摘要

由于在视频显着对象检测中广泛使用深度学习框架，因此最近的方法的准确性取得了惊人的进步。这些方法主要采用基于光流或复发性神经网络（RNN）的顺序模块，以学习强大的时空特征。这些模块是有效的，但显着增加了相应深层模型的计算负担。在本文中，为了简化网络并保持准确性，我们提出了一个轻巧的网络，该网络量身定制，该网络通过时空知识蒸馏而定为视频显着对象检测。具体而言，在空间方面，我们结合了显着指导特征嵌入结构和空间知识蒸馏以完善空间特征。在时间方面，我们提出了一种时间知识蒸馏策略，该策略使网络能够通过推断框架功能编码和从相邻框架中的信息来学习鲁棒的时间特征。广泛使用的视频数据集（例如Davis，Davsod，Segtrack-V2）的实验证明我们的方法可以实现竞争性能。此外，如果没有复杂的顺序模块的使用，则提出的网络可以获得高效率，每帧0.01。

Since the wide employment of deep learning frameworks in video salient object detection, the accuracy of the recent approaches has made stunning progress. These approaches mainly adopt the sequential modules, based on optical flow or recurrent neural network (RNN), to learn robust spatiotemporal features. These modules are effective but significantly increase the computational burden of the corresponding deep models. In this paper, to simplify the network and maintain the accuracy, we present a lightweight network tailored for video salient object detection through the spatiotemporal knowledge distillation. Specifically, in the spatial aspect, we combine a saliency guidance feature embedding structure and spatial knowledge distillation to refine the spatial features. In the temporal aspect, we propose a temporal knowledge distillation strategy, which allows the network to learn the robust temporal features through the infer-frame feature encoding and distilling information from adjacent frames. The experiments on widely used video datasets (e.g., DAVIS, DAVSOD, SegTrack-V2) prove that our approach achieves competitive performance. Furthermore, without the employment of the complex sequential modules, the proposed network can obtain high efficiency with 0.01s per frame.

下载PDF全文

下载文献需遵守相关版权规定

论文标题