论文标题
评估对不同粒度的上下文敏感特征扰动的鲁棒性
Evaluating Robustness to Context-Sensitive Feature Perturbations of Different Granularities
论文作者
论文摘要
我们不能保证培训数据集代表了部署期间将遇到的输入的分布。因此,我们必须相信我们的模型不会过分依赖这一假设。为此,我们引入了一种新方法,该方法将上下文敏感的特征扰动(例如形状,位置,纹理,颜色)识别到图像分类器的输入。我们通过对训练有素的生成神经网络的不同层的激活值进行小调整来产生这些变化。发电机早期的层扰动会导致更粗糙的特征变化。进一步扰动会导致更细粒度的变化。毫不奇怪,我们发现最先进的分类器对任何此类更改都不强大。更令人惊讶的是,当涉及到粗粒的特征变化时,我们发现针对像素空间扰动的对抗性训练不仅是无助的:它适得其反。
We cannot guarantee that training datasets are representative of the distribution of inputs that will be encountered during deployment. So we must have confidence that our models do not over-rely on this assumption. To this end, we introduce a new method that identifies context-sensitive feature perturbations (e.g. shape, location, texture, colour) to the inputs of image classifiers. We produce these changes by performing small adjustments to the activation values of different layers of a trained generative neural network. Perturbing at layers earlier in the generator causes changes to coarser-grained features; perturbations further on cause finer-grained changes. Unsurprisingly, we find that state-of-the-art classifiers are not robust to any such changes. More surprisingly, when it comes to coarse-grained feature changes, we find that adversarial training against pixel-space perturbations is not just unhelpful: it is counterproductive.