论文标题

余额是关键:私人中间拆分产生高纯粹的随机树

Balance is key: Private median splits yield high-utility random trees

论文作者

Consul, Shorya, Williamson, Sinead A.

论文摘要

随机森林是由于其多功能性而进行分类和回归的一种流行方法。但是,这种灵活性可能以用户隐私为代价,因为训练随机森林需要多个数据查询,通常是在培训数据的小,可识别的子集上。将这些查询私有化通常以高公用事业成本的价格出现,这在很大程度上是因为我们将查询私有化了数据的少量数据,这些查询很容易被添加的噪声损坏。在本文中,我们提出了Diprime Forests,这是一种基于树木的新型集合方法,用于私人回归和分类,适用于真实或分类协变量。我们使用中位数的私有版本产生分裂,这鼓励了平衡的叶子节点。通过避免低占用叶节点,我们在私有化叶子节点时避免了高信噪比。我们从理论上和经验上表明,所得算法表现出高效用,同时确保了差异隐私。

Random forests are a popular method for classification and regression due to their versatility. However, this flexibility can come at the cost of user privacy, since training random forests requires multiple data queries, often on small, identifiable subsets of the training data. Privatizing these queries typically comes at a high utility cost, in large part because we are privatizing queries on small subsets of the data, which are easily corrupted by added noise. In this paper, we propose DiPriMe forests, a novel tree-based ensemble method for differentially private regression and classification, which is appropriate for real or categorical covariates. We generate splits using a differentially private version of the median, which encourages balanced leaf nodes. By avoiding low occupancy leaf nodes, we avoid high signal-to-noise ratios when privatizing the leaf node sufficient statistics. We show theoretically and empirically that the resulting algorithm exhibits high utility, while ensuring differential privacy.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源