使用时空卷积的视频框架插值的非线性运动估计

论文标题

使用时空卷积的视频框架插值的非线性运动估计

Non-linear Motion Estimation for Video Frame Interpolation using Space-time Convolutions

论文作者

Dutta, Saikat, Subramaniam, Arulkumar, Mittal, Anurag

论文摘要

视频框架插值旨在综合视频中两个连续帧之间的一个或多个帧。它具有广泛的应用程序，包括慢动作视频生成，帧速率上缩放和开发视频编解码器。一些较旧的作品通过假设视频帧之间的每个像素线性运动解决了这个问题。但是，对象通常遵循真实域中的非线性运动模式，并且一些最新的方法试图通过非线性模型（例如二次）对每个像素运动进行建模。二次模型也可能是不准确的，尤其是在运动不连续的情况下（即突然的混蛋）和阻塞，其中某些流程信息可能无效或不准确。在我们的论文中，我们建议使用能够自适应选择要使用的运动模型的时空卷积网络近似于每像素运动。具体而言，我们能够在线性和二次模型之间软切换。为此，我们使用双向光流和遮挡图上使用端到端3D CNN编码器架构来估计每个像素的非线性运动模型。此外，采用运动改进模块来完善非线性运动，并通过用估计的每金运动估计的相邻帧进行简单的旋转框架来估算插值框架。通过一组全面的实验，我们验证了模型的有效性，并表明我们的方法在四个数据集（Vimeo，Davis，HD和GoPro）上优于最先进的算法。

Video frame interpolation aims to synthesize one or multiple frames between two consecutive frames in a video. It has a wide range of applications including slow-motion video generation, frame-rate up-scaling and developing video codecs. Some older works tackled this problem by assuming per-pixel linear motion between video frames. However, objects often follow a non-linear motion pattern in the real domain and some recent methods attempt to model per-pixel motion by non-linear models (e.g., quadratic). A quadratic model can also be inaccurate, especially in the case of motion discontinuities over time (i.e. sudden jerks) and occlusions, where some of the flow information may be invalid or inaccurate. In our paper, we propose to approximate the per-pixel motion using a space-time convolution network that is able to adaptively select the motion model to be used. Specifically, we are able to softly switch between a linear and a quadratic model. Towards this end, we use an end-to-end 3D CNN encoder-decoder architecture over bidirectional optical flows and occlusion maps to estimate the non-linear motion model of each pixel. Further, a motion refinement module is employed to refine the non-linear motion and the interpolated frames are estimated by a simple warping of the neighboring frames with the estimated per-pixel motion. Through a set of comprehensive experiments, we validate the effectiveness of our model and show that our method outperforms state-of-the-art algorithms on four datasets (Vimeo, DAVIS, HD and GoPro).

下载PDF全文

下载文献需遵守相关版权规定

论文标题