在视频中结合内部和外部约束以进行展开的快门

论文标题

在视频中结合内部和外部约束以进行展开的快门

Combining Internal and External Constraints for Unrolling Shutter in Videos

论文作者

Naor, Eyal, Antebi, Itai, Bagon, Shai, Irani, Michal

论文摘要

通过滚动式Shutter（RS）相机获得的视频导致空间延伸的帧。在快速相机/场景动作下，这些扭曲变得很重要。 RS的撤消效果有时被称为空间问题，需要对象进行整流/流离失所，以生成其正确的全局快门（GS）帧。但是，RS效应的原因是固有的，而不是空间。在本文中，我们为RS问题提出了一个时空解决方案。我们观察到，尽管它们的XY帧，RS视频及其相应的GS视频之间存在严重差异，但往往共享完全相同的XT片 - 直到已知的子框架时间变化。此外，尽管每个视频中都有强烈的时间别名，但它们具有相同的小型2D XT-Patches的分布。这允许使用RS输入视频施加的视频特定约束来限制GS输出视频。我们的算法由3个主要组成部分组成：（i）使用现成的方法（通过常规视频序列训练）在连续的RS帧之间进行密集的时间上采样，我们从中提取GS“建议”。（ii）学习使用专用的Mergenet正确合并此类GS“建议”的集合。（iii）特定于视频的零摄像优化，该优化构成了GS输出视频和RS输入视频之间XT斑点的相似性。尽管在小型合成RS/GS数据集中接受了训练，但在数值和视觉上，我们的方法在基准数据集上获得了最新的结果。此外，它可以很好地推广到具有运动类型的新复杂RS视频（例如，复杂的非刚性动作）之外的运动类型 - 竞争对更多数据培训的竞争方法的视频无法很好地处理。我们将这些概括功能归因于外部和内部约束的组合。

Videos obtained by rolling-shutter (RS) cameras result in spatially-distorted frames. These distortions become significant under fast camera/scene motions. Undoing effects of RS is sometimes addressed as a spatial problem, where objects need to be rectified/displaced in order to generate their correct global shutter (GS) frame. However, the cause of the RS effect is inherently temporal, not spatial. In this paper we propose a space-time solution to the RS problem. We observe that despite the severe differences between their xy frames, a RS video and its corresponding GS video tend to share the exact same xt slices -- up to a known sub-frame temporal shift. Moreover, they share the same distribution of small 2D xt-patches, despite the strong temporal aliasing within each video. This allows to constrain the GS output video using video-specific constraints imposed by the RS input video. Our algorithm is composed of 3 main components: (i) Dense temporal upsampling between consecutive RS frames using an off-the-shelf method, (which was trained on regular video sequences), from which we extract GS "proposals". (ii) Learning to correctly merge an ensemble of such GS "proposals" using a dedicated MergeNet. (iii) A video-specific zero-shot optimization which imposes the similarity of xt-patches between the GS output video and the RS input video. Our method obtains state-of-the-art results on benchmark datasets, both numerically and visually, despite being trained on a small synthetic RS/GS dataset. Moreover, it generalizes well to new complex RS videos with motion types outside the distribution of the training set (e.g., complex non-rigid motions) -- videos which competing methods trained on much more data cannot handle well. We attribute these generalization capabilities to the combination of external and internal constraints.

下载PDF全文

下载文献需遵守相关版权规定

论文标题