论文标题
改组的联合学习模型:隐私,沟通和准确权衡取舍
Shuffled Model of Federated Learning: Privacy, Communication and Accuracy Trade-offs
论文作者
论文摘要
我们考虑由联邦学习(FL)框架激发的沟通效率和隐私要求的分布式经验风险最小化(ERM)优化问题。在FL的背景下,对传统ERM问题的独特挑战包括(i)需要在客户的数据上提供隐私保证,(ii)压缩客户与服务器之间的通信,因为客户可能具有低频带宽度的链接,(iii)在每一轮客户的一小部分中,在服务器与客户之间的每一轮沟通中都可以在服务器和客户之间进行动态客户群体工作。为了解决这些挑战,我们为几个$ \ ell_p $ spaces的私人平均值估计而开发的(最佳)沟通效率方案,为ERM优化解决方案的每种迭代提供有效的梯度聚合。我们还提供了均值和上限,以进行平均估计,并具有对任意$ \ ell_p $空格的隐私和通信约束。为了获得整体沟通,隐私和优化性能操作点,我们将其与此设置固有的隐私放大机会相结合。我们的解决方案利用了客户采样和数据采样在每个客户端(通过随机梯度下降)以及使用匿名的最近开发的隐私框架提供的固有隐私放大,这些隐私放大有效地提出了对服务器响应的有效呈现,这些框架对服务器响应有效地呈现了相对于客户的随机改组。将这些结合在一起,我们证明了人们可以在使用完全精确通信的最新方法中获得相同的隐私,优化 - 性能的操作点,但以较低的沟通成本,即有效地获得“免费”的通信效率。
We consider a distributed empirical risk minimization (ERM) optimization problem with communication efficiency and privacy requirements, motivated by the federated learning (FL) framework. Unique challenges to the traditional ERM problem in the context of FL include (i) need to provide privacy guarantees on clients' data, (ii) compress the communication between clients and the server, since clients might have low-bandwidth links, (iii) work with a dynamic client population at each round of communication between the server and the clients, as a small fraction of clients are sampled at each round. To address these challenges we develop (optimal) communication-efficient schemes for private mean estimation for several $\ell_p$ spaces, enabling efficient gradient aggregation for each iteration of the optimization solution of the ERM. We also provide lower and upper bounds for mean estimation with privacy and communication constraints for arbitrary $\ell_p$ spaces. To get the overall communication, privacy, and optimization performance operation point, we combine this with privacy amplification opportunities inherent to this setup. Our solution takes advantage of the inherent privacy amplification provided by client sampling and data sampling at each client (through Stochastic Gradient Descent) as well as the recently developed privacy framework using anonymization, which effectively presents to the server responses that are randomly shuffled with respect to the clients. Putting these together, we demonstrate that one can get the same privacy, optimization-performance operating point developed in recent methods that use full-precision communication, but at a much lower communication cost, i.e., effectively getting communication efficiency for "free".