论文标题
分布式差异私人共同信息排名及其应用
Distributed Differentially Private Mutual Information Ranking and Its Applications
论文作者
论文摘要
相互信息的计算(MI)有助于了解一对随机变量之间共享的信息量。基于MI排名的自动化特征选择技术定期用于从敏感数据集中提取超过量的敏感数据集,超过数百万个功能和类别。可以将一系列单VS-ALL MI计算级联以产生N折MI结果,从而迅速指出了信息性关系。这种能够快速查明数十亿用户数据集的最有用的关系的能力引起了隐私问题。在本文中,我们介绍了分布式分布的私人共同信息(DDP-MI),即隐私安全的快速批量MI,例如特征选择,分割,排名和查询扩展等各种情况。该分布式实施受到全球模型差异隐私的保护,可为广泛的隐私攻击提供强有力的保证。我们还表明,与大型公共数据集中的标准实现相比,我们的DDP-MI可以大大提高MI计算的效率。
Computation of Mutual Information (MI) helps understand the amount of information shared between a pair of random variables. Automated feature selection techniques based on MI ranking are regularly used to extract information from sensitive datasets exceeding petabytes in size, over millions of features and classes. Series of one-vs-all MI computations can be cascaded to produce n-fold MI results, rapidly pinpointing informative relationships. This ability to quickly pinpoint the most informative relationships from datasets of billions of users creates privacy concerns. In this paper, we present Distributed Differentially Private Mutual Information (DDP-MI), a privacy-safe fast batch MI, across various scenarios such as feature selection, segmentation, ranking, and query expansion. This distributed implementation is protected with global model differential privacy to provide strong assurances against a wide range of privacy attacks. We also show that our DDP-MI can substantially improve the efficiency of MI calculations compared to standard implementations on a large-scale public dataset.