通过反攻击检测迭代对抗攻击

论文标题

通过反攻击检测迭代对抗攻击

Detection of Iterative Adversarial Attacks via Counter Attack

论文作者

Rottmann, Matthias, Maag, Kira, Peyron, Mathis, Krejic, Natasa, Gottschalk, Hanno

论文摘要

深度神经网络（DNN）已被证明是处理非结构化数据的强大工具。但是，对于高维数据，例如图像，它们本质上容易受到对抗性攻击的影响。输入中添加的小几乎看不见的扰动可用于欺骗DNN。近年来已经引入了各种攻击，硬化方法和检测方法。众所周知，由迭代最小化计算出的Carlini-Wagner（CW）类型攻击属于最难检测的攻击。在这项工作中，我们概述了CW攻击可以用作检测器本身的数学证据。也就是说，在某些假设和攻击迭代中，该检测器提供了原始图像和攻击图像的最佳最佳分离。在数值实验中，我们在实验中验证了该陈述，此外，在CIFAR10和Imagenet上获得了高达99.73％的AUROC值。这是CW攻击当前最新检测率的上部。

Deep neural networks (DNNs) have proven to be powerful tools for processing unstructured data. However for high-dimensional data, like images, they are inherently vulnerable to adversarial attacks. Small almost invisible perturbations added to the input can be used to fool DNNs. Various attacks, hardening methods and detection methods have been introduced in recent years. Notoriously, Carlini-Wagner (CW) type attacks computed by iterative minimization belong to those that are most difficult to detect. In this work we outline a mathematical proof that the CW attack can be used as a detector itself. That is, under certain assumptions and in the limit of attack iterations this detector provides asymptotically optimal separation of original and attacked images. In numerical experiments, we experimentally validate this statement and furthermore obtain AUROC values up to 99.73% on CIFAR10 and ImageNet. This is in the upper part of the spectrum of current state-of-the-art detection rates for CW attacks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题