论文标题
原子存在的私人分位数估计
Private Quantiles Estimation in the Presence of Atoms
论文作者
论文摘要
我们考虑了来自数据集的分布的多个分位数(MQ)的差异私有估计,该分布是现代数据分析中的一个关键构建块。我们将最近的非平滑逆敏感性(IS)机理应用于此特定问题。我们确定所得方法与最近发表的临时算法联合Exp密切相关。特别是,它们具有相同的计算复杂性和相似的效率。我们证明了连续分布的这两种算法的统计一致性。此外,我们从理论和经验上都证明了这种方法在峰值分布的情况下严重缺乏性能,这可以在原子存在下降解至潜在的灾难性影响。它的平滑版本(即,通过将最大内核应用于其输出密度)可以解决此问题,但仍然是实施的开放挑战。作为代理人,我们提出了一种称为启发式平滑的关节Exp(HSJointExp)的简单且数值高效的方法,该方法赋予了广泛分布的性能保证,并在有问题的数据集中获得了更好的数量级的结果。
We consider the differentially private estimation of multiple quantiles (MQ) of a distribution from a dataset, a key building block in modern data analysis. We apply the recent non-smoothed Inverse Sensitivity (IS) mechanism to this specific problem. We establish that the resulting method is closely related to the recently published ad hoc algorithm JointExp. In particular, they share the same computational complexity and a similar efficiency. We prove the statistical consistency of these two algorithms for continuous distributions. Furthermore, we demonstrate both theoretically and empirically that this method suffers from an important lack of performance in the case of peaked distributions, which can degrade up to a potentially catastrophic impact in the presence of atoms. Its smoothed version (i.e. by applying a max kernel to its output density) would solve this problem, but remains an open challenge to implement. As a proxy, we propose a simple and numerically efficient method called Heuristically Smoothed JointExp (HSJointExp), which is endowed with performance guarantees for a broad class of distributions and achieves results that are orders of magnitude better on problematic datasets.