论文标题
M-LVC:学习视频压缩的多个帧预测
M-LVC: Multiple Frames Prediction for Learned Video Compression
论文作者
论文摘要
我们为低延迟方案提供了端到端学习的视频压缩方案。以前的方法在使用上一个帧作为参考时受到限制。我们的方法介绍了先前多个帧的用法作为参考。在我们的方案中,运动向量(MV)字段是在当前帧和上一个帧之间计算的。有了多个参考帧和相关的多个MV字段,我们设计的网络可以生成更准确的当前帧预测,从而减少了残差。多个参考帧还有助于生成MV预测,从而降低了MV字段的编码成本。我们使用两个深层自动编码器分别压缩残差和MV。为了弥补自动编码器的压缩误差,我们还使用多个参考框架设计了MV改进网络和残留的细化网络。我们方案中的所有模块均通过单次触发损失函数共同优化。我们使用分步培训策略来优化整个方案。实验结果表明,所提出的方法的表现优于低延迟模式的现有学习视频压缩方法。我们的方法在PSNR和MS-SSIM中的性能也比H.265更好。我们的代码和模型公开可用。
We propose an end-to-end learned video compression scheme for low-latency scenarios. Previous methods are limited in using the previous one frame as reference. Our method introduces the usage of the previous multiple frames as references. In our scheme, the motion vector (MV) field is calculated between the current frame and the previous one. With multiple reference frames and associated multiple MV fields, our designed network can generate more accurate prediction of the current frame, yielding less residual. Multiple reference frames also help generate MV prediction, which reduces the coding cost of MV field. We use two deep auto-encoders to compress the residual and the MV, respectively. To compensate for the compression error of the auto-encoders, we further design a MV refinement network and a residual refinement network, taking use of the multiple reference frames as well. All the modules in our scheme are jointly optimized through a single rate-distortion loss function. We use a step-by-step training strategy to optimize the entire scheme. Experimental results show that the proposed method outperforms the existing learned video compression methods for low-latency mode. Our method also performs better than H.265 in both PSNR and MS-SSIM. Our code and models are publicly available.