论文标题
在黑盒范式中引诱可转移的对抗扰动
Luring of transferable adversarial perturbations in the black-box paradigm
论文作者
论文摘要
对对抗性例子的兴趣日益增长,即欺骗分类器的恶意修改的例子,导致许多防御措施旨在检测它们,使它们变得无礼或使模型更强大。在本文中,我们为一种新方法铺平了道路,以提高针对黑盒转移攻击的模型的鲁棒性。目标模型中包含一个可移动的其他神经网络,旨在诱导\ textit {Luring效果},这使对手选择错误的方向以欺骗目标模型。训练额外的模型得益于作用于逻辑序列顺序的损失函数。我们基于欺骗的方法只需要访问目标模型的预测,并且不需要标记的数据集。我们解释了诱饵效应,这要归功于有用和非舒适的有用特征的概念,并在MNIST,SVHN和CIFAR10上执行实验,以表征和评估这一现象。此外,我们讨论了两个简单的预测方案,并通过实验验证我们的方法可以用作防御,以有效地使用最先进的攻击来挫败对手并允许进行大型扰动。
The growing interest for adversarial examples, i.e. maliciously modified examples which fool a classifier, has resulted in many defenses intended to detect them, render them inoffensive or make the model more robust against them. In this paper, we pave the way towards a new approach to improve the robustness of a model against black-box transfer attacks. A removable additional neural network is included in the target model, and is designed to induce the \textit{luring effect}, which tricks the adversary into choosing false directions to fool the target model. Training the additional model is achieved thanks to a loss function acting on the logits sequence order. Our deception-based method only needs to have access to the predictions of the target model and does not require a labeled data set. We explain the luring effect thanks to the notion of robust and non-robust useful features and perform experiments on MNIST, SVHN and CIFAR10 to characterize and evaluate this phenomenon. Additionally, we discuss two simple prediction schemes, and verify experimentally that our approach can be used as a defense to efficiently thwart an adversary using state-of-the-art attacks and allowed to perform large perturbations.