论文标题
在线性回归中,灾难性遗忘如何灾难性?
How catastrophic can catastrophic forgetting be in linear regression?
论文作者
论文摘要
为了更好地理解灾难性的遗忘,我们研究了将过度参数化的线性模型拟合到具有不同输入分布的一系列任务。我们分析了该模型在训练后续任务后忘记了早期任务的真实标签,获得了精确的表达式和界限。我们在线性环境中持续学习与其他两个研究领域之间建立了联系:交替的预测与Kaczmarz方法。在特定的设置中,我们重点介绍了在这些领域所研究的遗忘和融合到离线解决方案之间的差异。特别是,当d维度在k迭代中周期性呈现t任务时,我们证明了遗忘的t^2 * min {1/sqrt(k),d/k}的上限。这与与离线解决方案的收敛相反,该解决方案可以根据现有的交替投影结果任意放慢。我们进一步表明,当任务以随机顺序显示时,可以提起T^2因子。
To better understand catastrophic forgetting, we study fitting an overparameterized linear model to a sequence of tasks with different input distributions. We analyze how much the model forgets the true labels of earlier tasks after training on subsequent tasks, obtaining exact expressions and bounds. We establish connections between continual learning in the linear setting and two other research areas: alternating projections and the Kaczmarz method. In specific settings, we highlight differences between forgetting and convergence to the offline solution as studied in those areas. In particular, when T tasks in d dimensions are presented cyclically for k iterations, we prove an upper bound of T^2 * min{1/sqrt(k), d/k} on the forgetting. This stands in contrast to the convergence to the offline solution, which can be arbitrarily slow according to existing alternating projection results. We further show that the T^2 factor can be lifted when tasks are presented in a random ordering.