论文标题
捍卫多个$ \ ell_p $ -norm有限的对抗扰动,通过门批归一化
Towards Defending Multiple $\ell_p$-norm Bounded Adversarial Perturbations via Gated Batch Normalization
论文作者
论文摘要
有广泛的证据表明,深层神经网络容易受到对抗性例子的影响,这激发了防御对抗攻击的发展。现有的对抗防御通常会针对单个特定的扰动类型提高模型鲁棒性(\ el _ el _ {\ infty} $ - 范围界限的对抗性示例)。但是,对手可能会在实践中生成多种类型的扰动(\ eg,$ \ ell_1 $,$ \ ell_2 $,$ \ ell _ {\ ell _ {\ infty} $扰动)。一些最近的方法提高了对多个$ \ ell_p $球的对抗攻击的模型鲁棒性,但是它们对每种扰动类型的性能仍然远非令人满意。在本文中,我们观察到不同的$ \ ell_p $有界的对抗扰动会引起不同的统计属性,这些统计属性可以通过批处理归一化(BN)的统计数据进行分离和表征。因此,我们提出了门控批归归式化(GBN),以对抗训练扰动式侵权预测因子,以捍卫多个$ \ ell_p $有限的对抗扰动。 GBN由一个多分支的BN层和一个封闭式子网组成。 GBN中的每个BN分支都负责一种扰动类型,以确保归一化的输出与学习扰动不变表示。同时,封闭式子网络旨在分离添加不同扰动类型的输入。我们对包括MNIST,CIFAR-10和Tiny-Imagenet在内的常用数据集进行了广泛的评估,并证明GBN的表现优于以前针对多种扰动类型的国防建议(\ ie,$ \ ell_1 $,$ \ ell_1 $,$ \ ell_2 $,以及$ \ ell _ Ell _ Ell _ figh_ fim _ flough perty pertty} $ {\ eftty}。
There has been extensive evidence demonstrating that deep neural networks are vulnerable to adversarial examples, which motivates the development of defenses against adversarial attacks. Existing adversarial defenses typically improve model robustness against individual specific perturbation types (\eg, $\ell_{\infty}$-norm bounded adversarial examples). However, adversaries are likely to generate multiple types of perturbations in practice (\eg, $\ell_1$, $\ell_2$, and $\ell_{\infty}$ perturbations). Some recent methods improve model robustness against adversarial attacks in multiple $\ell_p$ balls, but their performance against each perturbation type is still far from satisfactory. In this paper, we observe that different $\ell_p$ bounded adversarial perturbations induce different statistical properties that can be separated and characterized by the statistics of Batch Normalization (BN). We thus propose Gated Batch Normalization (GBN) to adversarially train a perturbation-invariant predictor for defending multiple $\ell_p$ bounded adversarial perturbations. GBN consists of a multi-branch BN layer and a gated sub-network. Each BN branch in GBN is in charge of one perturbation type to ensure that the normalized output is aligned towards learning perturbation-invariant representation. Meanwhile, the gated sub-network is designed to separate inputs added with different perturbation types. We perform an extensive evaluation of our approach on commonly-used dataset including MNIST, CIFAR-10, and Tiny-ImageNet, and demonstrate that GBN outperforms previous defense proposals against multiple perturbation types (\ie, $\ell_1$, $\ell_2$, and $\ell_{\infty}$ perturbations) by large margins.