论文标题
通过封闭的类关注和级联的特征漂移补偿,无示例无视力变压器的持续学习
Exemplar-free Continual Learning of Vision Transformers via Gated Class-Attention and Cascaded Feature Drift Compensation
论文作者
论文摘要
我们提出了一种新方法,用于对VIT的无效类增量培训。无示例性持续学习的主要挑战是保持学习者的可塑性,而不会引起灾难性忘记以前学习的任务。这通常是通过示例性重播来实现的,这可以帮助重新校准以前的任务分类器,以在学习新任务时发生的功能漂移。但是,示例重播是以保留以前任务的样本为代价的,对于许多应用程序可能是不可能的。为了解决持续的VIT训练的问题,我们首先提出了封闭式的类关注,以最大程度地减少最终VIT变压器块中的漂移。这种基于面具的门控应用于最后一个变压器块的集体意见机制,并强烈调节对先前任务至关重要的权重。重要的是,封闭式的课程注意在推理过程中不需要任务ID,这将其与其他参数隔离方法区分开。其次,我们提出了一种新的功能漂移补偿方法,该方法在学习新任务时可容纳主链中的特征漂移。封闭式的课堂注意力和级联特征漂移补偿的结合允许对新任务的可塑性,同时限制忘记以前的任务。在CIFAR-100,Tiny-Imagenet和Imagenet100上进行的广泛实验表明,与基于彩排的VIT方法相比,我们的无示例方法可获得竞争结果。
We propose a new method for exemplar-free class incremental training of ViTs. The main challenge of exemplar-free continual learning is maintaining plasticity of the learner without causing catastrophic forgetting of previously learned tasks. This is often achieved via exemplar replay which can help recalibrate previous task classifiers to the feature drift which occurs when learning new tasks. Exemplar replay, however, comes at the cost of retaining samples from previous tasks which for many applications may not be possible. To address the problem of continual ViT training, we first propose gated class-attention to minimize the drift in the final ViT transformer block. This mask-based gating is applied to class-attention mechanism of the last transformer block and strongly regulates the weights crucial for previous tasks. Importantly, gated class-attention does not require the task-ID during inference, which distinguishes it from other parameter isolation methods. Secondly, we propose a new method of feature drift compensation that accommodates feature drift in the backbone when learning new tasks. The combination of gated class-attention and cascaded feature drift compensation allows for plasticity towards new tasks while limiting forgetting of previous ones. Extensive experiments performed on CIFAR-100, Tiny-ImageNet and ImageNet100 demonstrate that our exemplar-free method obtains competitive results when compared to rehearsal based ViT methods.