对抗性示例游戏

论文标题

对抗性示例游戏

Adversarial Example Games

论文作者

Bose, Avishek Joey, Gidel, Gauthier, Berard, Hugo, Cianflone, Andre, Vincent, Pascal, Lacoste-Julien, Simon, Hamilton, William L.

论文摘要

能够欺骗经过训练的神经网络分类器的对抗性例子的存在，要求对可能的攻击有更好的理解，以指导针对他们的保障措施的发展。这包括在具有挑战性的非相互作用黑框设置中的攻击方法，在该设置中，生成了对抗性攻击，而无需访问目标模型，包括查询（包括查询）。这种环境中的先前攻击主要依赖于从经验观察中得出的算法创新（例如，这种势头有所帮助），缺乏有原则的可传递性保证。在这项工作中，我们为将可转移的对抗性示例用于整个假设类别提供了理论基础。我们介绍了对抗性示例游戏（AEG），该框架将对抗性示例的制作建模为攻击和分类器的发电机之间的最小游戏。 AEG通过从给定的假设类（例如架构）训练生成器和分类器来设计一种新方法来设计对抗性示例。我们证明该游戏具有平衡，并且最佳发电机能够制作对抗性示例，这些示例可以攻击相应的假设类中的任何分类器。我们证明了AEG对MNIST和CIFAR-10数据集的功效，表现优于先前的最新方法，平均相对改善为$ 29.9 \％$ $和$ 47.2 \％$ $ $ $ $ $ $ $分别针对不防御和强大的模型（表2和3）。

The existence of adversarial examples capable of fooling trained neural network classifiers calls for a much better understanding of possible attacks to guide the development of safeguards against them. This includes attack methods in the challenging non-interactive blackbox setting, where adversarial attacks are generated without any access, including queries, to the target model. Prior attacks in this setting have relied mainly on algorithmic innovations derived from empirical observations (e.g., that momentum helps), lacking principled transferability guarantees. In this work, we provide a theoretical foundation for crafting transferable adversarial examples to entire hypothesis classes. We introduce Adversarial Example Games (AEG), a framework that models the crafting of adversarial examples as a min-max game between a generator of attacks and a classifier. AEG provides a new way to design adversarial examples by adversarially training a generator and a classifier from a given hypothesis class (e.g., architecture). We prove that this game has an equilibrium, and that the optimal generator is able to craft adversarial examples that can attack any classifier from the corresponding hypothesis class. We demonstrate the efficacy of AEG on the MNIST and CIFAR-10 datasets, outperforming prior state-of-the-art approaches with an average relative improvement of $29.9\%$ and $47.2\%$ against undefended and robust models (Table 2 & 3) respectively.

下载PDF全文

下载文献需遵守相关版权规定

论文标题