使用调制的广义八度卷积，学习的可变速率多频图像压缩

论文标题

使用调制的广义八度卷积，学习的可变速率多频图像压缩

Learned Variable-Rate Multi-Frequency Image Compression using Modulated Generalized Octave Convolution

论文作者

Lin, Jianping, Akbari, Mohammad, Fu, Haisheng, Zhang, Qian, Wang, Shang, Liang, Jie, Liu, Dong, Liang, Feng, Zhang, Guohe, Tu, Chengjie

论文摘要

在此提案中，我们设计了一种学识渊博的多频图像压缩方法，该方法使用广义八度响应将潜在表示分解为高频（HF）和低频（LF）组件，而LF组件的分辨率比HF组件低于HF组件，这可以提高速率性能，从而改善了速率性能，类似于波浪的变换。此外，与原始的八度卷积相比，提议的广义八度卷积（GOCONV）和八度式转侧卷积（GOTCONV）具有内部激活层保留信息的空间结构，并启用HF和LF组件之间的更有效的过滤，从而进一步提高了性能。此外，我们使用Lagrangian参数开发了可变速率方案，以调节自动编码器中的所有内部特征图，该方案允许该方案仅使用三个模型实现JPEG AI的较大比特率范围。实验表明，所提出的方案的Y MS-SSIM比VVC好得多。就YUV PSNR而言，我们的计划与HEVC非常相似。

In this proposal, we design a learned multi-frequency image compression approach that uses generalized octave convolutions to factorize the latent representations into high-frequency (HF) and low-frequency (LF) components, and the LF components have lower resolution than HF components, which can improve the rate-distortion performance, similar to wavelet transform. Moreover, compared to the original octave convolution, the proposed generalized octave convolution (GoConv) and octave transposed-convolution (GoTConv) with internal activation layers preserve more spatial structure of the information, and enable more effective filtering between the HF and LF components, which further improve the performance. In addition, we develop a variable-rate scheme using the Lagrangian parameter to modulate all the internal feature maps in the auto-encoder, which allows the scheme to achieve the large bitrate range of the JPEG AI with only three models. Experiments show that the proposed scheme achieves much better Y MS-SSIM than VVC. In terms of YUV PSNR, our scheme is very similar to HEVC.

下载PDF全文

下载文献需遵守相关版权规定

论文标题