论文标题

统一正规化方法的持续学习

Unifying Regularisation Methods for Continual Learning

论文作者

Benzing, Frederik

论文摘要

持续学习依次解决许多不同任务的挑战。保持对早期任务的知识而不重新访问它们与人工神经网络的标准SGD培训发生冲突的目的。在不存储旧数据的情况下解决此问题的有影响力的方法是所谓的正则化方法。他们衡量每个参数在解决给定任务中的重要性,并随后保护重要参数免受巨大更改。在文献中,已经提出了三种衡量参数重要性的方法,它们启发了大量的后续工作。在这里,我们提供了有力的理论和经验证据,即这三种方法:弹性重量巩固(EWC),突触智能(SI)和记忆意识突触(MAS)非常相似,并且都与相同的理论数量相关。具体而言,我们表明,尽管源于截然不同的动机,但Si和MAS都近似Fisher信息的平方根,而Fisher是EWC的理论上合理的基础。此外,我们表明,对于SI,与Fisher的关系 - 实际上是由于以前未知的偏见。除了发现未知的相似性和统一正规化方法之外,我们还证明了我们的见解可以改善大批量培训的实践绩效。

Continual Learning addresses the challenge of learning a number of different tasks sequentially. The goal of maintaining knowledge of earlier tasks without re-accessing them starkly conflicts with standard SGD training for artificial neural networks. An influential method to tackle this problem without storing old data are so-called regularisation approaches. They measure the importance of each parameter for solving a given task and subsequently protect important parameters from large changes. In the literature, three ways to measure parameter importance have been put forward and they have inspired a large body of follow-up work. Here, we present strong theoretical and empirical evidence that these three methods, Elastic Weight Consolidation (EWC), Synaptic Intelligence (SI) and Memory Aware Synapses (MAS), are surprisingly similar and are all linked to the same theoretical quantity. Concretely, we show that, despite stemming from very different motivations, both SI and MAS approximate the square root of the Fisher Information, with the Fisher being the theoretically justified basis of EWC. Moreover, we show that for SI the relation to the Fisher -- and in fact its performance -- is due to a previously unknown bias. On top of uncovering unknown similarities and unifying regularisation approaches, we also demonstrate that our insights enable practical performance improvements for large batch training.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源