论文标题
非连接的无线分散梯度下降
Non-Coherent Over-the-Air Decentralized Gradient Descent
论文作者
论文摘要
由于噪音,褪色和有限的带宽,需要拓扑意识,传输计划以及对渠道状态信息(CSI)的获取,以减轻干扰和维持可靠的通信,因此在无线系统中实施分散的梯度下降(DGD)是具有挑战性的。这些操作可能会导致缺乏中央协调的大型网络中的大量信号开销和可伸缩性挑战。本文介绍了一种可扩展的DGD算法,该算法消除了调度,拓扑信息或CSI(平均值和瞬时)的需求。其核心是一种非连接的无通电(NCOTA)共识方案,可利用无线通道的嘈杂能量叠加特性。节点将其局部优化信号编码为OFDM帧内的能级,并同时发送,而无需协调。关键的见解是,接收到的能量平均等于传输信号的能量之和,其各自的平均通道增益比例相似,类似于共识步骤。该属性实现了无偏的共识估计,利用平均通道增益作为混合权重,从而消除了其显式设计或CSI的需求。引入共识步骤大小会减轻由于其预期值周围的能量波动而导致的共识估计错误。对于强烈的凸面问题,可以表明,在K迭代后,局部和全球最佳模型之间的预期平方以O(1/sqrt {k})的速率消失,并具有适当的减少学习和共识步骤。扩展可容纳一系列褪色模型和频率选择通道。与最先进的方案相比,图像分类的数值实验表明,在运行时间方面的收敛速度更快,尤其是在密集的网络方案中。
Implementing Decentralized Gradient Descent (DGD) in wireless systems is challenging due to noise, fading, and limited bandwidth, necessitating topology awareness, transmission scheduling, and the acquisition of channel state information (CSI) to mitigate interference and maintain reliable communications. These operations may result in substantial signaling overhead and scalability challenges in large networks lacking central coordination. This paper introduces a scalable DGD algorithm that eliminates the need for scheduling, topology information, or CSI (both average and instantaneous). At its core is a Non-Coherent Over-The-Air (NCOTA) consensus scheme that exploits a noisy energy superposition property of wireless channels. Nodes encode their local optimization signals into energy levels within an OFDM frame and transmit simultaneously, without coordination. The key insight is that the received energy equals, on average, the sum of the energies of the transmitted signals, scaled by their respective average channel gains, akin to a consensus step. This property enables unbiased consensus estimation, utilizing average channel gains as mixing weights, thereby removing the need for their explicit design or for CSI. Introducing a consensus stepsize mitigates consensus estimation errors due to energy fluctuations around their expected values. For strongly-convex problems, it is shown that the expected squared distance between the local and globally optimum models vanishes at a rate of O(1/sqrt{k}) after k iterations, with suitable decreasing learning and consensus stepsizes. Extensions accommodate a broad class of fading models and frequency-selective channels. Numerical experiments on image classification demonstrate faster convergence in terms of running time compared to state-of-the-art schemes, especially in dense network scenarios.