低排列梯度近似，用于深度神经网络的记忆效率训练

论文标题

低排列梯度近似，用于深度神经网络的记忆效率训练

Low-rank Gradient Approximation For Memory-Efficient On-device Training of Deep Neural Network

论文作者

Gooneratne, Mary, Sim, Khe Chai, Zadrazil, Petr, Kabel, Andreas, Beaufays, Françoise, Motta, Giovanni

论文摘要

移动设备上的培训机器学习模型具有提高模型的隐私和准确性的潜力。但是，实现这一目标的主要障碍之一是移动设备的内存限制。减少训练记忆使具有高维重量矩阵（例如自动语音识别（ASR）模型）的模型可以在设备上进行训练。在本文中，我们建议使用低级参数化作为节省训练记忆的途径来近似深神经网络的梯度矩阵。低级梯度近似使得更先进的内存密集型优化技术可以在设备上运行。我们的实验结果表明，我们可以将训练记忆降低约33.0％，以进行亚当优化。它使用可比较的内存来优化动量，并在ASR个性化任务上实现了4.5％的相对较低单词错误率。

Training machine learning models on mobile devices has the potential of improving both privacy and accuracy of the models. However, one of the major obstacles to achieving this goal is the memory limitation of mobile devices. Reducing training memory enables models with high-dimensional weight matrices, like automatic speech recognition (ASR) models, to be trained on-device. In this paper, we propose approximating the gradient matrices of deep neural networks using a low-rank parameterization as an avenue to save training memory. The low-rank gradient approximation enables more advanced, memory-intensive optimization techniques to be run on device. Our experimental results show that we can reduce the training memory by about 33.0% for Adam optimization. It uses comparable memory to momentum optimization and achieves a 4.5% relative lower word error rate on an ASR personalization task.

下载PDF全文

下载文献需遵守相关版权规定

论文标题