论文标题
通过预测净化捍卫模型反演和会员推理攻击
Defending Model Inversion and Membership Inference Attacks via Prediction Purification
论文作者
论文摘要
神经网络易受数据推断攻击(例如模型反转攻击和成员推理攻击)的影响,攻击者可以从目标分类器预测的置信分数中推断出重建和数据样本的成员资格。在本文中,我们提出了一种统一的方法,即纯化框架,以捍卫数据推断攻击。它通过减少其分散体来净化目标分类器预测的置信分数向量。净化器可以进一步专门用于通过对抗性学习来捍卫特定攻击。我们在基准数据集和分类器上评估我们的方法。我们表明,当净化器致力于一次攻击时,它自然会捍卫另一种攻击,这在经验上证明了两次攻击之间的联系。净化器可以有效地捍卫这两种攻击。例如,它可以将构件推理准确性降低多达15%,并将模型反演误差提高到4个。此外,它的分类准确性下降率少于0.4%,而对置信度得分的失真却小于5.5%。
Neural networks are susceptible to data inference attacks such as the model inversion attack and the membership inference attack, where the attacker could infer the reconstruction and the membership of a data sample from the confidence scores predicted by the target classifier. In this paper, we propose a unified approach, namely purification framework, to defend data inference attacks. It purifies the confidence score vectors predicted by the target classifier by reducing their dispersion. The purifier can be further specialized in defending a particular attack via adversarial learning. We evaluate our approach on benchmark datasets and classifiers. We show that when the purifier is dedicated to one attack, it naturally defends the other one, which empirically demonstrates the connection between the two attacks. The purifier can effectively defend both attacks. For example, it can reduce the membership inference accuracy by up to 15% and increase the model inversion error by a factor of up to 4. Besides, it incurs less than 0.4% classification accuracy drop and less than 5.5% distortion to the confidence scores.