没有模型信息的对抗检测

论文标题

没有模型信息的对抗检测

Adversarial Detection without Model Information

论文作者

Moitra, Abhishek, Kim, Youngeun, Panda, Priyadarshini

论文摘要

先前的最新对抗检测工作是分类器模型依赖性的，即，它们需要分类器模型输出和参数来训练检测器或在对抗检测过程中。这使他们的检测方法分类器模型具有特定的特定于分类器模型。此外，分类器模型输出和参数可能并不总是可以访问。为此，我们使用简单的能量函数提出了一种分类器模型独立的对抗检测方法，以区分对抗和自然输入。我们训练独立于分类器模型的独立探测器，并具有层的能量分离（LES）训练，以增加自然和对抗能量之间的分离。因此，我们执行基于能量分布的对抗检测。我们的方法在cifar10，cifar100和tinyimagenet数据集的各种梯度，得分和高斯噪声攻击中，与最先进的检测工程（ROC-AUC> 0.9）实现了可比的性能。此外，与先前的工作相比，我们的检测方法是轻量重量，需要更少的培训数据（占实际数据集的40％），并且可以在不同的数据集中传输。对于可重复性，我们在https://github.com/intelligent-computing-lab-yale/energy-separation-trinaine上提供层的能量分离训练代码

Prior state-of-the-art adversarial detection works are classifier model dependent, i.e., they require classifier model outputs and parameters for training the detector or during adversarial detection. This makes their detection approach classifier model specific. Furthermore, classifier model outputs and parameters might not always be accessible. To this end, we propose a classifier model independent adversarial detection method using a simple energy function to distinguish between adversarial and natural inputs. We train a standalone detector independent of the classifier model, with a layer-wise energy separation (LES) training to increase the separation between natural and adversarial energies. With this, we perform energy distribution-based adversarial detection. Our method achieves comparable performance with state-of-the-art detection works (ROC-AUC > 0.9) across a wide range of gradient, score and gaussian noise attacks on CIFAR10, CIFAR100 and TinyImagenet datasets. Furthermore, compared to prior works, our detection approach is light-weight, requires less amount of training data (40% of the actual dataset) and is transferable across different datasets. For reproducibility, we provide layer-wise energy separation training code at https://github.com/Intelligent-Computing-Lab-Yale/Energy-Separation-Training

下载PDF全文

下载文献需遵守相关版权规定

论文标题