SPARCL：稀疏的持续学习边缘

论文标题

SPARCL：稀疏的持续学习边缘

SparCL: Sparse Continual Learning on the Edge

论文作者

Wang, Zifeng, Zhan, Zheng, Gong, Yifan, Yuan, Geng, Niu, Wei, Jian, Tong, Ren, Bin, Ioannidis, Stratis, Wang, Yanzhi, Dy, Jennifer

论文摘要

持续学习的现有工作（CL）的重点是减轻灾难性遗忘，即在学习新任务时对过去任务的模型绩效恶化。但是，CL系统的训练效率不足，这限制了CL系统在资源有限的方案下的现实应用。在这项工作中，我们提出了一个名为“稀疏持续学习”（SPARCL）的新颖框架，这是第一个利用稀疏性在边缘设备上具有成本效益的持续学习的研究。 SPARCL通过三个方面的协同作用来实现训练加速度和准确性保护：体重稀疏性，数据效率和梯度稀疏性。具体来说，我们建议在整个CL过程中学习一个稀疏网络，动态数据删除（DDR），以删除信息较少的培训数据和动态梯度掩蔽（DGM），以稀疏梯度更新。他们每个人不仅提高了效率，而且进一步减轻了灾难性的遗忘。 SPARCL始终提高现有最新CL方法（SOTA）CL方法的训练效率最多减少了训练失败，而且令人惊讶的是，SOTA的准确性最多最多提高了1.7％。 SPARCL还优于通过将SOTA稀疏训练方法适应CL设置的效率和准确性获得的竞争基线。我们还评估了SPARCL对真实手机的有效性，进一步表明了我们方法的实际潜力。

Existing work in continual learning (CL) focuses on mitigating catastrophic forgetting, i.e., model performance deterioration on past tasks when learning a new task. However, the training efficiency of a CL system is under-investigated, which limits the real-world application of CL systems under resource-limited scenarios. In this work, we propose a novel framework called Sparse Continual Learning(SparCL), which is the first study that leverages sparsity to enable cost-effective continual learning on edge devices. SparCL achieves both training acceleration and accuracy preservation through the synergy of three aspects: weight sparsity, data efficiency, and gradient sparsity. Specifically, we propose task-aware dynamic masking (TDM) to learn a sparse network throughout the entire CL process, dynamic data removal (DDR) to remove less informative training data, and dynamic gradient masking (DGM) to sparsify the gradient updates. Each of them not only improves efficiency, but also further mitigates catastrophic forgetting. SparCL consistently improves the training efficiency of existing state-of-the-art (SOTA) CL methods by at most 23X less training FLOPs, and, surprisingly, further improves the SOTA accuracy by at most 1.7%. SparCL also outperforms competitive baselines obtained from adapting SOTA sparse training methods to the CL setting in both efficiency and accuracy. We also evaluate the effectiveness of SparCL on a real mobile phone, further indicating the practical potential of our method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题