论文标题
非凸界的分布式学习:从批处理到流数据,及以后
Distributed Learning in the Non-Convex World: From Batch to Streaming Data, and Beyond
论文作者
论文摘要
分布式学习已成为许多人所设想的众所周知的世界的关键推动者。本文讨论了可扩展分布式处理和实时智能的四个关键要素 - 问题,数据,通信和计算。我们的目的是提供有关这些要素应如何以有效和连贯的方式共同工作的新鲜和独特的观点。特别是,我们{提供了针对非凸模型(即问题类),处理批处理和流数据(即数据类型),以分布式方式(即通信和计算范式)上的网络上开发的最新技术(即问题类),处理批次和流数据(即数据类型)开发的最新技术。我们描述了一系列流行的分布式算法背后的直觉和连接,强调了如何在计算和通信成本之间进行权衡。实际问题和未来的研究方向也将进行讨论。
Distributed learning has become a critical enabler of the massively connected world envisioned by many. This article discusses four key elements of scalable distributed processing and real-time intelligence --- problems, data, communication and computation. Our aim is to provide a fresh and unique perspective about how these elements should work together in an effective and coherent manner. In particular, we {provide a selective review} about the recent techniques developed for optimizing non-convex models (i.e., problem classes), processing batch and streaming data (i.e., data types), over the networks in a distributed manner (i.e., communication and computation paradigm). We describe the intuitions and connections behind a core set of popular distributed algorithms, emphasizing how to trade off between computation and communication costs. Practical issues and future research directions will also be discussed.