论文标题
通过成对的时间关系探索全球多元化的关注以进行视频摘要
Exploring global diverse attention via pairwise temporal relation for video summarization
论文作者
论文摘要
视频摘要是促进视频搜索和浏览的有效方法。大多数现有系统都采用基于编码器的复发性神经网络,这些神经网络未能明确使系统生成的摘要帧多样化,同时需要进行密集的计算。在本文中,我们提出了一种通过称为sum-gda的全球多样性注意的有效的卷积神经网络架构,用于视频摘要,该摘要在全球视角中适应了注意机制,以考虑视频框架的成对时间关系。特别是,GDA模块具有两个优点:1)它模拟了配对框架内的关系以及所有对之间的关系,从而在一个视频的所有框架中引起了全球关注; 2)它反映了每个框架对整个视频的重要性,从而在这些框架上引起了各种关注。因此,sum-gda有益于产生各种帧以形成令人满意的视频摘要。对三个数据集(即Summe,TVSUM和VTW)进行的广泛实验表明,Sum-GDA及其扩展的表现优于其他竞争性的最先进方法,具有显着的改进。此外,提出的模型可以与计算成本显着降低,这有助于在高度要求的应用程序中部署。
Video summarization is an effective way to facilitate video searching and browsing. Most of existing systems employ encoder-decoder based recurrent neural networks, which fail to explicitly diversify the system-generated summary frames while requiring intensive computations. In this paper, we propose an efficient convolutional neural network architecture for video SUMmarization via Global Diverse Attention called SUM-GDA, which adapts attention mechanism in a global perspective to consider pairwise temporal relations of video frames. Particularly, the GDA module has two advantages: 1) it models the relations within paired frames as well as the relations among all pairs, thus capturing the global attention across all frames of one video; 2) it reflects the importance of each frame to the whole video, leading to diverse attention on these frames. Thus, SUM-GDA is beneficial for generating diverse frames to form satisfactory video summary. Extensive experiments on three data sets, i.e., SumMe, TVSum, and VTW, have demonstrated that SUM-GDA and its extension outperform other competing state-of-the-art methods with remarkable improvements. In addition, the proposed models can be run in parallel with significantly less computational costs, which helps the deployment in highly demanding applications.