边界，集中和截断：数据分析的统一隐私损失组成

论文标题

边界，集中和截断：数据分析的统一隐私损失组成

Bounding, Concentrating, and Truncating: Unifying Privacy Loss Composition for Data Analytics

论文作者

Cesar, Mark, Rogers, Ryan

论文摘要

差异隐私（DP）可为个人数据提供严格的隐私保证，同时还允许对整体敏感数据集进行准确的统计信息。要设计私人系统，必须设计第一个私人算法，以量化发布每个结果的隐私损失。但是，将噪声注入计算中的私人算法不足以确保个人数据受到保护，因为许多嘈杂的结果最终集中于真实的，非私密的结果。因此，已经有几项作品为与私人算法的多种交互作用如何积累了精确的公式。但是，这些公式要么为隐私损失提供了非常通用的界限，要么以某些类型的私人算法过于悲观的代价，或者它们的范围太窄而无法应用于通用隐私系统。在这项工作中，我们将特殊类别私有（DP）算法的特殊类别的现有隐私损失组成范围与一般的DP组成范围统一。特别是，当分析师可以选择纯DP，有限范围（例如指数机制）或以任何顺序的浓缩DP机制时，我们会提供强大的隐私损失范围。我们还提供最佳的隐私损失范围，这些损失范围在分析师可以在批处理中选择纯的DP和有限范围机制时适用，即非自适应。此外，当分析人员适应每个类中的机制时，我们显示出纯dp和有界范围机制的不同预定序列之间的隐私损失差异。最后，我们比较了基于直方图数据集的拉普拉斯和高斯机制的组成界。

Differential privacy (DP) provides rigorous privacy guarantees on individual's data while also allowing for accurate statistics to be conducted on the overall, sensitive dataset. To design a private system, first private algorithms must be designed that can quantify the privacy loss of each outcome that is released. However, private algorithms that inject noise into the computation are not sufficient to ensure individuals' data is protected due to many noisy results ultimately concentrating to the true, non-privatized result. Hence there have been several works providing precise formulas for how the privacy loss accumulates over multiple interactions with private algorithms. However, these formulas either provide very general bounds on the privacy loss, at the cost of being overly pessimistic for certain types of private algorithms, or they can be too narrow in scope to apply to general privacy systems. In this work, we unify existing privacy loss composition bounds for special classes of differentially private (DP) algorithms along with general DP composition bounds. In particular, we provide strong privacy loss bounds when an analyst may select pure DP, bounded range (e.g. exponential mechanisms), or concentrated DP mechanisms in any order. We also provide optimal privacy loss bounds that apply when an analyst can select pure DP and bounded range mechanisms in a batch, i.e. non-adaptively. Further, when an analyst selects mechanisms within each class adaptively, we show a difference in privacy loss between different, predetermined orderings of pure DP and bounded range mechanisms. Lastly, we compare the composition bounds of Laplace and Gaussian mechanisms based on histogram datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题