论文标题
使用各种决策界限抵制深层神经网络中的对抗性攻击
Resisting Adversarial Attacks in Deep Neural Networks using Diverse Decision Boundaries
论文作者
论文摘要
深度学习(DL)系统的安全性是一个极为重要的研究领域,因为它们正在部署在多个应用程序中,因为它们不断改善,以解决具有挑战性的任务。尽管有压倒性的承诺,但深度学习系统容易受到制作的对抗性例子的影响,这可能是人眼无法察觉的,但可能会导致模型错误分类。对基于整体技术的对抗性扰动的保护已被证明很容易受到更强大的对手的影响,或者证明缺乏端到端评估。在本文中,我们试图开发一种新的基于整体的解决方案,该解决方案构建具有各种决策边界的防御者模型相对于原始模型。通过(1)通过一种称为拆分和剃须的方法转换输入的分类器的集合,以及(2)通过一种称为对比度功能的方法限制重要特征,从而导致对对抗性攻击的各种梯度,从而减少了从原始的捍卫者转移到捍卫者模型的类别的机会。我们使用标准图像分类数据集(即MNIST,CIFAR-10和CIFAR-100)进行了广泛的实验,以实现最新的对抗攻击,以证明拟议的集合防御的鲁棒性。我们还在存在更强大的对手的情况下评估鲁棒性,以同时靶向合奏中的所有模型。已经提供了整体假阳性和误报的结果,以估算所提出的方法的整体性能。
The security of deep learning (DL) systems is an extremely important field of study as they are being deployed in several applications due to their ever-improving performance to solve challenging tasks. Despite overwhelming promises, the deep learning systems are vulnerable to crafted adversarial examples, which may be imperceptible to the human eye, but can lead the model to misclassify. Protections against adversarial perturbations on ensemble-based techniques have either been shown to be vulnerable to stronger adversaries or shown to lack an end-to-end evaluation. In this paper, we attempt to develop a new ensemble-based solution that constructs defender models with diverse decision boundaries with respect to the original model. The ensemble of classifiers constructed by (1) transformation of the input by a method called Split-and-Shuffle, and (2) restricting the significant features by a method called Contrast-Significant-Features are shown to result in diverse gradients with respect to adversarial attacks, which reduces the chance of transferring adversarial examples from the original to the defender model targeting the same class. We present extensive experimentations using standard image classification datasets, namely MNIST, CIFAR-10 and CIFAR-100 against state-of-the-art adversarial attacks to demonstrate the robustness of the proposed ensemble-based defense. We also evaluate the robustness in the presence of a stronger adversary targeting all the models within the ensemble simultaneously. Results for the overall false positives and false negatives have been furnished to estimate the overall performance of the proposed methodology.