论文标题
针对对抗性攻击的二阶证明防御措施
Second-Order Provable Defenses against Adversarial Attacks
论文作者
论文摘要
鲁棒性证书是给定输入到分类器(或其下限)的决策边界的最小距离。对于{\ it any}的输入扰动,其幅度小于证书值,分类输出将保持不变。准确地计算神经网络的鲁棒性证书很困难,因为它需要解决非凸优化。在本文中,我们为具有两个步骤的可区分激活功能的神经网络提供了计算效率的鲁棒性证书。首先,我们表明,如果网络的Hessian的特征值有限,我们可以使用凸优化在$ l_2 $ norm中计算稳健性证书。其次,我们在深网的曲率上得出了一个计算高效的可区分上限。我们还将曲率绑定在网络训练期间作为正规化术语,以提高其认证的鲁棒性。将这些结果放在一起会导致我们提出的{\ bf c}基于urvature {\ bf r} ubustness {\ bf c} erertificate(crc)和{\ bf c}基于uRvature {\ bf r} ubust usust usust usust for {\ bf t} raining(crt)。我们的数值结果表明,与基于间隔结合的传播(IBP)培训相比,CRT导致明显更高的认证鲁棒精度。我们实现了认证的鲁棒精度69.79 \%,57.78 \%和53.19 \%,而基于IBP的方法则分别在2,3和4层网络上分别在MNIST-DATASET上获得44.96 \%,44.74 \%和44.66 \%。
A robustness certificate is the minimum distance of a given input to the decision boundary of the classifier (or its lower bound). For {\it any} input perturbations with a magnitude smaller than the certificate value, the classification output will provably remain unchanged. Exactly computing the robustness certificates for neural networks is difficult since it requires solving a non-convex optimization. In this paper, we provide computationally-efficient robustness certificates for neural networks with differentiable activation functions in two steps. First, we show that if the eigenvalues of the Hessian of the network are bounded, we can compute a robustness certificate in the $l_2$ norm efficiently using convex optimization. Second, we derive a computationally-efficient differentiable upper bound on the curvature of a deep network. We also use the curvature bound as a regularization term during the training of the network to boost its certified robustness. Putting these results together leads to our proposed {\bf C}urvature-based {\bf R}obustness {\bf C}ertificate (CRC) and {\bf C}urvature-based {\bf R}obust {\bf T}raining (CRT). Our numerical results show that CRT leads to significantly higher certified robust accuracy compared to interval-bound propagation (IBP) based training. We achieve certified robust accuracy 69.79\%, 57.78\% and 53.19\% while IBP-based methods achieve 44.96\%, 44.74\% and 44.66\% on 2,3 and 4 layer networks respectively on the MNIST-dataset.