QS-ATTN：在I2i翻译中查询对比度学习的查询选择

论文标题

QS-ATTN：在I2i翻译中查询对比度学习的查询选择

QS-Attn: Query-Selected Attention for Contrastive Learning in I2I Translation

论文作者

Hu, Xueqi, Zhou, Xinyue, Huang, Qiusheng, Shi, Zhengyi, Sun, Li, Li, Qingli

论文摘要

未配对的图像到图像图像（I2i）翻译通常需要最大程度地提高源跨不同域之间的源和翻译图像之间的相互信息，这对于生成器保持源内容并防止其不必要的修改至关重要。自我监督的对比学习已经在I2i中成功应用。通过将同一位置的功能限制为比不同位置的功能更接近，它隐含地确保了从源中获取内容的结果。但是，以前的工作使用从随机位置的功能强加约束，这可能不合适，因为某些位置包含较少的源域信息。此外，该功能本身并不能反映与他人的关系。本文通过故意选择重要的锚点来解决这些问题。我们设计了一个查询选择的注意力（QS-ATTN）模块，该模块比较了源域中的特征距离，从而为注意力矩阵提供了每个行分布的注意力矩阵。然后，我们根据查询根据其显着性的测量来选择根据分布计算的。选定的损失被视为对比损失的锚。同时，使用减少的注意矩阵来路由两个域中的特征，从而使源关系保持在合成中。我们在三个不同的I2I数据集中验证了我们提出的方法，表明它可以提高图像质量而不添加可学习的参数。

Unpaired image-to-image (I2I) translation often requires to maximize the mutual information between the source and the translated images across different domains, which is critical for the generator to keep the source content and prevent it from unnecessary modifications. The self-supervised contrastive learning has already been successfully applied in the I2I. By constraining features from the same location to be closer than those from different ones, it implicitly ensures the result to take content from the source. However, previous work uses the features from random locations to impose the constraint, which may not be appropriate since some locations contain less information of source domain. Moreover, the feature itself does not reflect the relation with others. This paper deals with these problems by intentionally selecting significant anchor points for contrastive learning. We design a query-selected attention (QS-Attn) module, which compares feature distances in the source domain, giving an attention matrix with a probability distribution in each row. Then we select queries according to their measurement of significance, computed from the distribution. The selected ones are regarded as anchors for contrastive loss. At the same time, the reduced attention matrix is employed to route features in both domains, so that source relations maintain in the synthesis. We validate our proposed method in three different I2I datasets, showing that it increases the image quality without adding learnable parameters.

下载PDF全文

下载文献需遵守相关版权规定

论文标题