交叉聚合变压器用于图像恢复

论文标题

交叉聚合变压器用于图像恢复

Cross Aggregation Transformer for Image Restoration

论文作者

Chen, Zheng, Zhang, Yulun, Gu, Jinjin, Zhang, Yongbing, Kong, Linghe, Yuan, Xin

论文摘要

最近，变压器体系结构已引入图像恢复中，以取代卷积神经网络（CNN），结果令人惊讶。考虑到具有全球关注的变压器的高计算复杂性，某些方法使用本地方形窗口来限制自我注意力的范围。但是，这些方法缺乏不同窗口之间的直接相互作用，这限制了远程依赖性的建立。为了解决上述问题，我们提出了一个新的图像恢复模型，即交叉聚合变压器（CAT）。我们猫的核心是矩形窗口自我注意事项（RWIN-SA），它利用了不同头部的水平和垂直矩形窗户注意力，以扩大注意力区域并汇总了交叉不同窗户的功能。我们还引入了用于不同窗口相互作用的轴向移动操作。此外，我们提出了局部互补的模块来补充自我发场机制，该模块将CNN（例如翻译不变性和局部性）的电感偏置纳入变压器，从而实现了全球本地耦合。广泛的实验表明，我们的CAT在几种图像恢复应用方面胜过最新的最新方法。代码和型号可从https://github.com/zhengchen1999/cat获得。

Recently, Transformer architecture has been introduced into image restoration to replace convolution neural network (CNN) with surprising results. Considering the high computational complexity of Transformer with global attention, some methods use the local square window to limit the scope of self-attention. However, these methods lack direct interaction among different windows, which limits the establishment of long-range dependencies. To address the above issue, we propose a new image restoration model, Cross Aggregation Transformer (CAT). The core of our CAT is the Rectangle-Window Self-Attention (Rwin-SA), which utilizes horizontal and vertical rectangle window attention in different heads parallelly to expand the attention area and aggregate the features cross different windows. We also introduce the Axial-Shift operation for different window interactions. Furthermore, we propose the Locality Complementary Module to complement the self-attention mechanism, which incorporates the inductive bias of CNN (e.g., translation invariance and locality) into Transformer, enabling global-local coupling. Extensive experiments demonstrate that our CAT outperforms recent state-of-the-art methods on several image restoration applications. The code and models are available at https://github.com/zhengchen1999/CAT.

下载PDF全文

下载文献需遵守相关版权规定

论文标题