论文标题
旨在从分配的角度理解和提高对抗性的可转移性
Towards Understanding and Boosting Adversarial Transferability from a Distribution Perspective
论文作者
论文摘要
近年来,针对深度神经网络(DNN)的可转移对抗性攻击受到广泛关注。对抗性示例可以由替代模型制定,然后成功攻击未知目标模型,这给DNN带来了严重的威胁。可转让性的确切根本原因仍未完全理解。以前的工作主要从模型的角度探讨原因,例如决策边界,模型架构和模型容量。近年来,针对深度神经网络(DNN)的对抗性攻击受到广泛关注。对抗性示例可以由替代模型制定,然后成功攻击未知目标模型,这给DNN带来了严重的威胁。可转让性的确切根本原因仍未完全理解。以前的工作主要从模型的角度探讨原因。在这里,我们从数据分布的角度研究了可传递性,并假设将图像从其原始分布中推开可以增强对抗性转移性。要具体而言,将图像从原始分布中移出,使不同的模型几乎无法正确分类图像,从而使图像受益,这使图像受益,并将图像拖动到目标分布中,误导了模型以将图像分类为目标类,从而使目标攻击受益。为此,我们提出了一种新颖的方法,该方法通过操纵图像的分布来制定对抗性实例。我们对多个DNN进行全面的可转移攻击,以证明该方法的有效性。我们的方法可以显着提高工艺攻击的可传递性,并在未靶向和有针对性的方案中实现最先进的性能,在某些情况下,超过先前的最佳方法高达40美元$ \%$。
Transferable adversarial attacks against Deep neural networks (DNNs) have received broad attention in recent years. An adversarial example can be crafted by a surrogate model and then attack the unknown target model successfully, which brings a severe threat to DNNs. The exact underlying reasons for the transferability are still not completely understood. Previous work mostly explores the causes from the model perspective, e.g., decision boundary, model architecture, and model capacity. adversarial attacks against Deep neural networks (DNNs) have received broad attention in recent years. An adversarial example can be crafted by a surrogate model and then attack the unknown target model successfully, which brings a severe threat to DNNs. The exact underlying reasons for the transferability are still not completely understood. Previous work mostly explores the causes from the model perspective. Here, we investigate the transferability from the data distribution perspective and hypothesize that pushing the image away from its original distribution can enhance the adversarial transferability. To be specific, moving the image out of its original distribution makes different models hardly classify the image correctly, which benefits the untargeted attack, and dragging the image into the target distribution misleads the models to classify the image as the target class, which benefits the targeted attack. Towards this end, we propose a novel method that crafts adversarial examples by manipulating the distribution of the image. We conduct comprehensive transferable attacks against multiple DNNs to demonstrate the effectiveness of the proposed method. Our method can significantly improve the transferability of the crafted attacks and achieves state-of-the-art performance in both untargeted and targeted scenarios, surpassing the previous best method by up to 40$\%$ in some cases.