朝着具有最佳模型压缩的模式转移的视觉信息表示形式

论文标题

朝着具有最佳模型压缩的模式转移的视觉信息表示形式

Towards Modality Transferable Visual Information Representation with Optimal Model Compression

论文作者

Lin, Rongqun, Zhu, Linwei, Wang, Shiqi, Kwong, Sam

论文摘要

在各种以图像/视频为中心的应用程序中，紧凑表示视觉信号至关重要。尽管开发了许多方法来通过删除视觉信号中的冗余来改善图像和视频编码性能，但是要少少的作品将视觉信号转换为另一种良好的模式，以提高表示能力。在本文中，我们提出了一种新的视觉信号表示方案，以利用可转移方式的理念。特别是，可以通过在线培训来表征和吸收输入场景的统计数据的深度学习模型可以有效地表示速率 - 实用性优化的意义，以充当BitStream中的增强层。因此，可以通过优化新的模式融合来进一步保证整体性能。所提出的框架是根据最先进的视频编码标准（即多功能视频编码）实现的，并且根据广泛的评估观察到了更好的表示能力。

Compactly representing the visual signals is of fundamental importance in various image/video-centered applications. Although numerous approaches were developed for improving the image and video coding performance by removing the redundancies within visual signals, much less work has been dedicated to the transformation of the visual signals to another well-established modality for better representation capability. In this paper, we propose a new scheme for visual signal representation that leverages the philosophy of transferable modality. In particular, the deep learning model, which characterizes and absorbs the statistics of the input scene with online training, could be efficiently represented in the sense of rate-utility optimization to serve as the enhancement layer in the bitstream. As such, the overall performance can be further guaranteed by optimizing the new modality incorporated. The proposed framework is implemented on the state-of-the-art video coding standard (i.e., versatile video coding), and significantly better representation capability has been observed based on extensive evaluations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题