通过成对的时间关系探索全球多元化的关注以进行视频摘要

论文标题

通过成对的时间关系探索全球多元化的关注以进行视频摘要

Exploring global diverse attention via pairwise temporal relation for video summarization

论文作者

Li, Ping, Ye, Qinghao, Zhang, Luming, Yuan, Li, Xu, Xianghua, Shao, Ling

论文摘要

视频摘要是促进视频搜索和浏览的有效方法。大多数现有系统都采用基于编码器的复发性神经网络，这些神经网络未能明确使系统生成的摘要帧多样化，同时需要进行密集的计算。在本文中，我们提出了一种通过称为sum-gda的全球多样性注意的有效的卷积神经网络架构，用于视频摘要，该摘要在全球视角中适应了注意机制，以考虑视频框架的成对时间关系。特别是，GDA模块具有两个优点：1）它模拟了配对框架内的关系以及所有对之间的关系，从而在一个视频的所有框架中引起了全球关注； 2）它反映了每个框架对整个视频的重要性，从而在这些框架上引起了各种关注。因此，sum-gda有益于产生各种帧以形成令人满意的视频摘要。对三个数据集（即Summe，TVSUM和VTW）进行的广泛实验表明，Sum-GDA及其扩展的表现优于其他竞争性的最先进方法，具有显着的改进。此外，提出的模型可以与计算成本显着降低，这有助于在高度要求的应用程序中部署。

Video summarization is an effective way to facilitate video searching and browsing. Most of existing systems employ encoder-decoder based recurrent neural networks, which fail to explicitly diversify the system-generated summary frames while requiring intensive computations. In this paper, we propose an efficient convolutional neural network architecture for video SUMmarization via Global Diverse Attention called SUM-GDA, which adapts attention mechanism in a global perspective to consider pairwise temporal relations of video frames. Particularly, the GDA module has two advantages: 1) it models the relations within paired frames as well as the relations among all pairs, thus capturing the global attention across all frames of one video; 2) it reflects the importance of each frame to the whole video, leading to diverse attention on these frames. Thus, SUM-GDA is beneficial for generating diverse frames to form satisfactory video summary. Extensive experiments on three data sets, i.e., SumMe, TVSum, and VTW, have demonstrated that SUM-GDA and its extension outperform other competing state-of-the-art methods with remarkable improvements. In addition, the proposed models can be run in parallel with significantly less computational costs, which helps the deployment in highly demanding applications.

下载PDF全文

下载文献需遵守相关版权规定

论文标题