论文标题
关于在当地差异隐私下收集多维数据的风险
On the Risks of Collecting Multidimensional Data Under Local Differential Privacy
论文作者
论文摘要
来自人口的多个统计数据的私人收集是一个基本的统计问题。意识到这一点的一种可能方法是依靠差异隐私(LDP)的本地模型。为了完成单个属性和多个属性的频率估计的任务,已经开发了许多LDP协议。这些研究主要集中于改善算法的实用性,以确保服务器准确执行估计。在本文中,我们研究了两个最新解决方案,以估算多个属性的频率估计,针对多维数据的LDP协议,研究了隐私威胁(重新识别和属性推理攻击)。为了扩大我们的研究范围,我们还通过实验评估了五种广泛使用的不动力学协议,即广义的随机响应,最佳的局部哈希,子集选择,car仪和最佳的一般性单位编码。最后,我们还提出了一种对策,以改善对确定威胁的效用和鲁棒性。我们的贡献可以帮助旨在私下收集用户统计数据的从业人员确定哪种自然界生物机制最适合其需求。
The private collection of multiple statistics from a population is a fundamental statistical problem. One possible approach to realize this is to rely on the local model of differential privacy (LDP). Numerous LDP protocols have been developed for the task of frequency estimation of single and multiple attributes. These studies mainly focused on improving the utility of the algorithms to ensure the server performs the estimations accurately. In this paper, we investigate privacy threats (re-identification and attribute inference attacks) against LDP protocols for multidimensional data following two state-of-the-art solutions for frequency estimation of multiple attributes. To broaden the scope of our study, we have also experimentally assessed five widely used LDP protocols, namely, generalized randomized response, optimal local hashing, subset selection, RAPPOR and optimal unary encoding. Finally, we also proposed a countermeasure that improves both utility and robustness against the identified threats. Our contributions can help practitioners aiming to collect users' statistics privately to decide which LDP mechanism best fits their needs.