YUV 4：2：0的学习视频压缩使用基于流动的条件框架间编码的内容

论文标题

YUV 4：2：0的学习视频压缩使用基于流动的条件框架间编码的内容

Learned Video Compression for YUV 4:2:0 Content Using Flow-based Conditional Inter-frame Coding

论文作者

Ho, Yung-Han, Lin, Chih-Hsuan, Chen, Peng-Yu, Chen, Mu-Jung, Chang, Chih-Peng, Peng, Wen-Hsiao, Hang, Hsueh-Ming

论文摘要

本文提出了一个基于学习的视频压缩框架，用于YUV 4：2：0内容上的可变速率编码。大多数基于学习的视频压缩模型采用了传统的基于混合的编码体系结构，涉及时间预测，然后进行残余编码。但是，最近的研究表明，从信息理论的角度来看，残差编码是最佳的。此外，大多数现有模型都针对RGB内容进行了优化。此外，它们需要单独的型号进行可变速率编码。为了解决这些问题，这项工作提出了将YUV 4：2：0内容的条件框架间编码合并的尝试。我们介绍了一个有条件的基于流的框架间编码器，以提高框架间的编码效率。为了使我们的编解码器适应YUV 4：2：0内容，我们采用了一种简单的策略，即使用深度和深度到空间转换。最后，我们使用速率适应网以实现可变速率编码，而无需培训多个模型。实验结果表明，就PSNR-YUV而言，在UVG和MCL-JCV数据集上，我们的模型的性能优于X265。但是，在ISCAS'22 GC的更具挑战性的数据集中，仍然有足够的改进空间。这种性能不足是由于缺乏大共和党大小的框架间编码能力，并且可以通过增加模型容量并应用错误传播感知的训练策略来减轻。

This paper proposes a learning-based video compression framework for variable-rate coding on YUV 4:2:0 content. Most existing learning-based video compression models adopt the traditional hybrid-based coding architecture, which involves temporal prediction followed by residual coding. However, recent studies have shown that residual coding is sub-optimal from the information-theoretic perspective. In addition, most existing models are optimized with respect to RGB content. Furthermore, they require separate models for variable-rate coding. To address these issues, this work presents an attempt to incorporate the conditional inter-frame coding for YUV 4:2:0 content. We introduce a conditional flow-based inter-frame coder to improve the inter-frame coding efficiency. To adapt our codec to YUV 4:2:0 content, we adopt a simple strategy of using space-to-depth and depth-to-space conversions. Lastly, we employ a rate-adaption net to achieve variable-rate coding without training multiple models. Experimental results show that our model performs better than x265 on UVG and MCL-JCV datasets in terms of PSNR-YUV. However, on the more challenging datasets from ISCAS'22 GC, there is still ample room for improvement. This insufficient performance is due to the lack of inter-frame coding capability at a large GOP size and can be mitigated by increasing the model capacity and applying an error propagation-aware training strategy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题