论文标题

净化器:通过转换置信分数捍卫数据推理攻击

Purifier: Defending Data Inference Attacks via Transforming Confidence Scores

论文作者

Yang, Ziqi, Wang, Lijin, Yang, Da, Wan, Jie, Zhao, Ziming, Chang, Ee-Chien, Zhang, Fan, Ren, Kui

论文摘要

神经网络容易受到数据推理攻击的影响,例如成员推理攻击,对抗模型反演攻击和属性推理攻击,攻击者可以从目标分类器预测的置信度分数中推断出有用的信息,例如成员资格,重建或数据样本的敏感属性。在本文中,我们提出了一种方法,即净化器,以防止会员推理攻击。它改变了目标分类器预测的置信分数向量,并使纯化的置信度得分在成员和非成员之间的单个形状,统计分布和预测标签上无法区分。实验结果表明,净化器有助于以高效和效率高出以前的防御方法来捍卫会员推断攻击,并造成可忽略的效用损失。此外,我们的进一步实验表明,净化器在捍卫对抗模型反演攻击和属性推理攻击方面也有效。例如,在FaceScrub530分类器上增加了大约4次倒置误差,当我们的实验中部署净化器时,属性推理精度会大大下降。

Neural networks are susceptible to data inference attacks such as the membership inference attack, the adversarial model inversion attack and the attribute inference attack, where the attacker could infer useful information such as the membership, the reconstruction or the sensitive attributes of a data sample from the confidence scores predicted by the target classifier. In this paper, we propose a method, namely PURIFIER, to defend against membership inference attacks. It transforms the confidence score vectors predicted by the target classifier and makes purified confidence scores indistinguishable in individual shape, statistical distribution and prediction label between members and non-members. The experimental results show that PURIFIER helps defend membership inference attacks with high effectiveness and efficiency, outperforming previous defense methods, and also incurs negligible utility loss. Besides, our further experiments show that PURIFIER is also effective in defending adversarial model inversion attacks and attribute inference attacks. For example, the inversion error is raised about 4+ times on the Facescrub530 classifier, and the attribute inference accuracy drops significantly when PURIFIER is deployed in our experiment.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源