论文标题
带有白色和黑色盒子攻击的DeepFake-image探测器
Evading Deepfake-Image Detectors with White- and Black-Box Attacks
论文作者
论文摘要
现在可以综合不存在的人的高度现实的图像。例如,这种内容与创建负责欺诈活动的欺诈性社交媒体概况有关。因此,正在部署重大努力来检测合成生成的内容。一种流行的法医方法训练神经网络,以区分真实与合成内容。 我们表明,这种法医分类器容易受到一系列攻击,这些攻击将分类器降低到接近0%的精度。我们对最先进的分类器进行了五个攻击案例研究,该分类器仅在一个发电机上训练时,几乎所有现有的图像发生器都在ROC曲线(AUC)下达到0.95的区域。通过完全访问分类器,我们可以将图像中每个像素的最低位翻转,以将分类器的AUC降低到0.0005;扰动图像区域的1%,将分类器的AUC降低到0.08;或在合成器的潜在空间中添加单个噪声模式,以将分类器的AUC降低到0.17。我们还开发了一个黑框攻击,该攻击无法访问目标分类器,将AUC降低到0.22。这些攻击揭示了某些图像触发分类器的重大漏洞。
It is now possible to synthesize highly realistic images of people who don't exist. Such content has, for example, been implicated in the creation of fraudulent social-media profiles responsible for dis-information campaigns. Significant efforts are, therefore, being deployed to detect synthetically-generated content. One popular forensic approach trains a neural network to distinguish real from synthetic content. We show that such forensic classifiers are vulnerable to a range of attacks that reduce the classifier to near-0% accuracy. We develop five attack case studies on a state-of-the-art classifier that achieves an area under the ROC curve (AUC) of 0.95 on almost all existing image generators, when only trained on one generator. With full access to the classifier, we can flip the lowest bit of each pixel in an image to reduce the classifier's AUC to 0.0005; perturb 1% of the image area to reduce the classifier's AUC to 0.08; or add a single noise pattern in the synthesizer's latent space to reduce the classifier's AUC to 0.17. We also develop a black-box attack that, with no access to the target classifier, reduces the AUC to 0.22. These attacks reveal significant vulnerabilities of certain image-forensic classifiers.