论文标题
仅标签模型反转攻击:需要最少信息的攻击
Label-only Model Inversion Attack: The Attack that Requires the Least Information
论文作者
论文摘要
在模型反转攻击中,对手试图仅使用模型的输出来重建用于训练目标模型的数据记录。在启动现代模型反转攻击时,讨论的策略通常基于预测的置信得分向量,即黑盒攻击或目标模型的参数,即白盒攻击。但是,在现实世界中,模特所有者通常只给出预测的标签。置信分数向量和模型参数被隐藏为防止这种攻击的防御机制。不幸的是,我们找到了一种模型反转方法,该方法只能基于输出标签重建输入数据记录。我们认为这是需要最少信息才能成功的攻击,因此具有最佳的适用性。关键思想是利用目标模型的错误率来计算从一组数据记录到目标模型决策边界的中值距离。因此,距离用于生成置信分数向量,该置信得分向量被用于训练攻击模型以重建数据记录。实验结果表明,高度可识别的数据记录可以用比现有方法少得多的信息重建。
In a model inversion attack, an adversary attempts to reconstruct the data records, used to train a target model, using only the model's output. In launching a contemporary model inversion attack, the strategies discussed are generally based on either predicted confidence score vectors, i.e., black-box attacks, or the parameters of a target model, i.e., white-box attacks. However, in the real world, model owners usually only give out the predicted labels; the confidence score vectors and model parameters are hidden as a defense mechanism to prevent such attacks. Unfortunately, we have found a model inversion method that can reconstruct the input data records based only on the output labels. We believe this is the attack that requires the least information to succeed and, therefore, has the best applicability. The key idea is to exploit the error rate of the target model to compute the median distance from a set of data records to the decision boundary of the target model. The distance, then, is used to generate confidence score vectors which are adopted to train an attack model to reconstruct the data records. The experimental results show that highly recognizable data records can be reconstructed with far less information than existing methods.