论文标题
模拟域中的隐私性分布式学习
Privacy-Preserving Distributed Learning in the Analog Domain
论文作者
论文摘要
我们考虑分布式学习的关键问题,而不是数据,同时将其私密从计算服务器中。解决此问题的最先进方法依赖于将数据量化为有限字段,因此可以采用安全多方计算的加密方法。但是,由于数据和计算溢出的固定点表示,这些方法可能会导致实质性准确性损失。为了解决这些关键问题,我们提出了一种新型算法来解决数据在模拟域中,例如真实/复数的字段。我们从信息理论和加密角度来表征数据的隐私,同时在模拟域中的两个概念之间建立了联系。更具体地说,区分安全性(DS)与相互信息安全性(MIS)指标之间的众所周知的连接已从离散域扩展到继续域。然后,使用众所周知的结果,使用众所周知的结果来绑定有关协议中的服务器中有关数据泄漏数据的信息的量,并以相关的噪声为单输入多数输出(SIMO)通道的能力。它显示了如何在使用浮点数数字表示数据时如何采用建议的框架来执行计算任务。然后,我们证明这导致了数据的隐私水平和结果准确性之间的基本权衡。作为应用程序,我们还展示了如何在保持数据以及私有训练的模型的同时训练机器学习模型。然后显示了在MNIST数据集上实验的数值结果。此外,与有限领域的定点实现相比,显示了实验优势。
We consider the critical problem of distributed learning over data while keeping it private from the computational servers. The state-of-the-art approaches to this problem rely on quantizing the data into a finite field, so that the cryptographic approaches for secure multiparty computing can then be employed. These approaches, however, can result in substantial accuracy losses due to fixed-point representation of the data and computation overflows. To address these critical issues, we propose a novel algorithm to solve the problem when data is in the analog domain, e.g., the field of real/complex numbers. We characterize the privacy of the data from both information-theoretic and cryptographic perspectives, while establishing a connection between the two notions in the analog domain. More specifically, the well-known connection between the distinguishing security (DS) and the mutual information security (MIS) metrics is extended from the discrete domain to the continues domain. This is then utilized to bound the amount of information about the data leaked to the servers in our protocol, in terms of the DS metric, using well-known results on the capacity of single-input multiple-output (SIMO) channel with correlated noise. It is shown how the proposed framework can be adopted to do computation tasks when data is represented using floating-point numbers. We then show that this leads to a fundamental trade-off between the privacy level of data and accuracy of the result. As an application, we also show how to train a machine learning model while keeping the data as well as the trained model private. Then numerical results are shown for experiments on the MNIST dataset. Furthermore, experimental advantages are shown comparing to fixed-point implementations over finite fields.