ITTR：带有变压器的未配对图像到图像翻译

论文标题

ITTR：带有变压器的未配对图像到图像翻译

ITTR: Unpaired Image-to-Image Translation with Transformers

论文作者

Zheng, Wanfeng, Li, Qiang, Zhang, Guoxin, Wan, Pengfei, Wang, Zhongyuan

论文摘要

未配对的图像到图像的翻译是在没有配对训练数据的情况下将图像从源域转换为目标域。通过利用CNN提取局部语义，已经开发了各种技术来改善翻译性能。但是，基于CNN的发电机缺乏捕获长期依赖性来很好地利用全球语义的能力。最近，视力变压器已被广泛研究以完成识别任务。尽管很有吸引力，但由于产生难度和计算限制，简单地将基于识别的视觉变压器转移到图像到图像翻译是不合适的。在本文中，我们提出了一种有效，有效的体系结构，用于使用变压器（ITTR）的未配对的图像到图像翻译。它有两个主要设计：1）从不同接收场的令牌混合的混合感知块（HPB），以利用全球语义； 2）双重修剪自我注意力（DPSA），以急剧降低计算复杂性。我们的ITTR在六个基准数据集上的未配对图像到图像翻译的最新图像均优于最先进的图像。

Unpaired image-to-image translation is to translate an image from a source domain to a target domain without paired training data. By utilizing CNN in extracting local semantics, various techniques have been developed to improve the translation performance. However, CNN-based generators lack the ability to capture long-range dependency to well exploit global semantics. Recently, Vision Transformers have been widely investigated for recognition tasks. Though appealing, it is inappropriate to simply transfer a recognition-based vision transformer to image-to-image translation due to the generation difficulty and the computation limitation. In this paper, we propose an effective and efficient architecture for unpaired Image-to-Image Translation with Transformers (ITTR). It has two main designs: 1) hybrid perception block (HPB) for token mixing from different receptive fields to utilize global semantics; 2) dual pruned self-attention (DPSA) to sharply reduce the computational complexity. Our ITTR outperforms the state-of-the-arts for unpaired image-to-image translation on six benchmark datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题