迈向可认证的对抗样品检测

论文标题

迈向可认证的对抗样品检测

Towards Certifiable Adversarial Sample Detection

论文作者

Shumailov, Ilia, Zhao, Yiren, Mullins, Robert, Anderson, Ross

论文摘要

卷积神经网络（CNN）被部署在越来越多的分类系统中，但是可以通过恶意制作对抗样本来欺骗它们，并成为真正的威胁。已经提出了各种建议，以改善CNN的对抗性鲁棒性，但所有这些都受到了绩效的惩罚或其他局限性。在本文中，我们以可证明的对抗检测方案（可认证的禁忌陷阱（CTT））的形式提供了一种新方法。该系统可以在合理假设上为某些$ l _ {\ infty} $尺寸检测到对抗性输入的可认证保证，即培训数据具有与测试数据相同的分布。我们开发和评估了CTT的多种版本，具有一系列防御能力，培训间接费用和对对抗样本的认证性。针对具有各种$ L_P $规范的对手，CTT优于纯粹着重于改善网络鲁棒性的现有防御方法。我们表明，CTT在干净的测试数据，部署时最小的计算开销以及可以支持复杂的安全策略的误报率很小。

Convolutional Neural Networks (CNNs) are deployed in more and more classification systems, but adversarial samples can be maliciously crafted to trick them, and are becoming a real threat. There have been various proposals to improve CNNs' adversarial robustness but these all suffer performance penalties or other limitations. In this paper, we provide a new approach in the form of a certifiable adversarial detection scheme, the Certifiable Taboo Trap (CTT). The system can provide certifiable guarantees of detection of adversarial inputs for certain $l_{\infty}$ sizes on a reasonable assumption, namely that the training data have the same distribution as the test data. We develop and evaluate several versions of CTT with a range of defense capabilities, training overheads and certifiability on adversarial samples. Against adversaries with various $l_p$ norms, CTT outperforms existing defense methods that focus purely on improving network robustness. We show that CTT has small false positive rates on clean test data, minimal compute overheads when deployed, and can support complex security policies.

下载PDF全文

下载文献需遵守相关版权规定

论文标题