论文标题
极端的多域,多任务学习,统一文本到文本传输变压器
Extreme Multi-Domain, Multi-Task Learning With Unified Text-to-Text Transfer Transformers
论文作者
论文摘要
文本到文本变压器在多任务转移学习的任务中取得了巨大的成功,尤其是在自然语言处理(NLP)方面。但是,尽管已经有几次尝试在不同域上训练变压器,但这些域之间通常存在明确的关系,例如,代码摘要,自然语言摘要描述了代码。很少有尝试研究多任务转移学习如何在明显不同的域中的任务上工作。在这个项目中,我们使用多域文本到文本传输变压器(MD-T5)在两个域中的四个域(Python Code和Chess)上研究了多域,多任务学习的行为。我们使用三种流行的培训策略进行了广泛的实验:BERT式关节预处理 +连续的填充,GPT式关节预处理 +连续的芬太尼以及GPT型关节预处理 +关节框架。此外,我们在四个指标上评估了模型 - 播放得分,评估得分,BLEU得分和多域学习分数(MDLS)。这些指标衡量各种任务和多域学习的性能。我们表明,尽管负面的知识转移和灾难性遗忘仍然是所有模型的巨大挑战,但GPT式的联合预处理 +联合芬太尼策略在多域,多任务学习中表现出最大的希望,因为它在所有四个任务中都表现良好,同时仍保持其多域知识。
Text-to-text transformers have shown remarkable success in the task of multi-task transfer learning, especially in natural language processing (NLP). However, while there have been several attempts to train transformers on different domains, there is usually a clear relationship between these domains, e.g.,, code summarization, where the natural language summary describes the code. There have been very few attempts to study how multi-task transfer learning works on tasks in significantly different domains. In this project, we investigated the behavior of multi-domain, multi-task learning using multi-domain text-to-text transfer transformers (MD-T5) on four tasks across two domains - Python Code and Chess. We carried out extensive experiments using three popular training strategies: Bert-style joint pretraining + successive finetuning, GPT-style joint pretraining + successive finetuning, and GPT-style joint pretraining + joint finetuning. Also, we evaluate the model on four metrics - Play Score, Eval Score, BLEU Score, and Multi-Domain Learning Score (MDLS). These metrics measure performance across the various tasks and multi-domain learning. We show that while negative knowledge transfer and catastrophic forgetting are still considerable challenges for all the models, the GPT-style joint pretraining + joint finetuning strategy showed the most promise in multi-domain, multi-task learning as it performs well across all four tasks while still keeping its multi-domain knowledge.