论文标题
可扩展隐私的分布式学习
Scalable Privacy-Preserving Distributed Learning
论文作者
论文摘要
在本文中,我们通过在广泛的MAPReduce抽象中分析我们对隐私的抽象进行分析,解决了隐私保护分布式学习和对机器学习模型的评估问题。我们设计主轴(可扩展的隐私性分布式学习),这是第一个分布式和隐私保护系统,通过实现合作梯度 - 淡季的执行以及对所获得的模型的评估,并保留在无线电模型中具有最多n-1 colduding coldudies colluding的模型,从而涵盖了完整的ML工作流程。主轴使用多方同构加密来在没有明显开销的情况下对加密数据进行并行高深度计算。我们对分布式数据集上的广义线性模型进行培训和评估实例化,并表明它能够准确(与非安全性的中央训练模型相提并论),并有效(由于计算的多级平行化)火车模型(需要在数千个具有数千个功能中的大量输入数据上的迭代数据,分布式数据doders offersited data Depareers offers ofers tare progentials)和有效(由于计算的多层次并行化)。例如,它在不到三分钟的时间内分布在160个数据提供商之间的100万个样本的数据集上训练一个逻辑回归模型。
In this paper, we address the problem of privacy-preserving distributed learning and the evaluation of machine-learning models by analyzing it in the widespread MapReduce abstraction that we extend with privacy constraints. We design SPINDLE (Scalable Privacy-preservINg Distributed LEarning), the first distributed and privacy-preserving system that covers the complete ML workflow by enabling the execution of a cooperative gradient-descent and the evaluation of the obtained model and by preserving data and model confidentiality in a passive-adversary model with up to N-1 colluding parties. SPINDLE uses multiparty homomorphic encryption to execute parallel high-depth computations on encrypted data without significant overhead. We instantiate SPINDLE for the training and evaluation of generalized linear models on distributed datasets and show that it is able to accurately (on par with non-secure centrally-trained models) and efficiently (due to a multi-level parallelization of the computations) train models that require a high number of iterations on large input data with thousands of features, distributed among hundreds of data providers. For instance, it trains a logistic-regression model on a dataset of one million samples with 32 features distributed among 160 data providers in less than three minutes.