论文标题
多任务学习的任务自适应参数共享
Task Adaptive Parameter Sharing for Multi-Task Learning
论文作者
论文摘要
适应具有广泛功能的预训练模型已成为学习各种下游任务的标准实践。为每个任务进行微调不同模型的典型方法是性能,但会产生大量的内存成本。为了有效地学习多个下游任务,我们介绍了任务自适应参数共享(TAPS),这是一种通过自适应修改特定于任务特定的图层子集来调整基本模型为新任务的通用方法。这使多任务学习可以最大程度地减少任务之间的资源和竞争。 TAPS解决了一个关节优化问题,该问题确定了与基本模型共享的层以及特定于任务特定权重的值。此外,对活动层的数量的稀疏性惩罚促进了与基本模型的重量共享。与其他方法相比,TAP在下游任务上保留了很高的精度,同时引入了很少的特定任务参数。此外,TAPS对模型架构不可知,仅需要对培训方案进行微小的更改。我们在一系列微调任务和体系结构(Resnet,densenet,vit)上评估了我们的方法,并表明它可以实现最新的性能,同时易于实施。
Adapting pre-trained models with broad capabilities has become standard practice for learning a wide range of downstream tasks. The typical approach of fine-tuning different models for each task is performant, but incurs a substantial memory cost. To efficiently learn multiple downstream tasks we introduce Task Adaptive Parameter Sharing (TAPS), a general method for tuning a base model to a new task by adaptively modifying a small, task-specific subset of layers. This enables multi-task learning while minimizing resources used and competition between tasks. TAPS solves a joint optimization problem which determines which layers to share with the base model and the value of the task-specific weights. Further, a sparsity penalty on the number of active layers encourages weight sharing with the base model. Compared to other methods, TAPS retains high accuracy on downstream tasks while introducing few task-specific parameters. Moreover, TAPS is agnostic to the model architecture and requires only minor changes to the training scheme. We evaluate our method on a suite of fine-tuning tasks and architectures (ResNet, DenseNet, ViT) and show that it achieves state-of-the-art performance while being simple to implement.