文本驱动的视频加速度：一种弱监督的强化学习方法

论文标题

文本驱动的视频加速度：一种弱监督的强化学习方法

Text-Driven Video Acceleration: A Weakly-Supervised Reinforcement Learning Method

论文作者

Ramos, Washington, Silva, Michel, Araujo, Edson, Moura, Victor, Oliveira, Keller, Marcolino, Leandro Soriano, Nascimento, Erickson R.

论文摘要

在我们的数字时代和用户有限的时间内的视频增长增加了对处理未修剪视频的需求，以产生较短的版本传达相同的信息。尽管摘要方法取得了显着的进展，但其中大多数只能选择几个帧或脱脂，从而创建视觉差距并破坏视频上下文。本文介绍了一种基于强化学习公式的新型弱监督的方法论，可以使用文本加速教学视频。一种新颖的联合奖励功能指导我们的经纪人选择要删除的帧并将输入视频减少到目标长度的情况下，而无需在最终视频中造成差距。我们还提出了扩展的视觉引导的文档注意网络（VDAN+），该网络可以生成高度歧视的嵌入空间，以表示文本和视觉数据。我们的实验表明，我们的方法在对基线的精确度，召回和F1得分方面达到了最佳性能，同时有效地控制了视频的输出长度。请访问https://www.verlab.dcc.ufmg.br/semantic-hyperlapse/tpami2022/以获取代码和额外的结果。

The growth of videos in our digital age and the users' limited time raise the demand for processing untrimmed videos to produce shorter versions conveying the same information. Despite the remarkable progress that summarization methods have made, most of them can only select a few frames or skims, creating visual gaps and breaking the video context. This paper presents a novel weakly-supervised methodology based on a reinforcement learning formulation to accelerate instructional videos using text. A novel joint reward function guides our agent to select which frames to remove and reduce the input video to a target length without creating gaps in the final video. We also propose the Extended Visually-guided Document Attention Network (VDAN+), which can generate a highly discriminative embedding space to represent both textual and visual data. Our experiments show that our method achieves the best performance in Precision, Recall, and F1 Score against the baselines while effectively controlling the video's output length. Visit https://www.verlab.dcc.ufmg.br/semantic-hyperlapse/tpami2022/ for code and extra results.

下载PDF全文

下载文献需遵守相关版权规定

论文标题