使用选择性掩蔽作为预训练和微调之间的桥梁

论文标题

使用选择性掩蔽作为预训练和微调之间的桥梁

Using Selective Masking as a Bridge between Pre-training and Fine-tuning

论文作者

Lad, Tanish, Maheshwari, Himanshu, Kottukkal, Shreyas, Mamidi, Radhika

论文摘要

预先培训语言模型，然后对下游任务进行微调，以证明了各种NLP任务的最新结果。预训练通常与下游任务无关，并且以前的作品表明，仅此预训练可能不足以捕获特定于任务的细微差别。我们提出了一种方法，通过在标准监督微调之前通过特定于任务的掩盖来量身定制预训练的BERT模型。为此，首先收集了一个单词列表。例如，如果任务是情感分类，我们会收集一小部分代表积极和负面情感的单词。接下来，使用单词列表来测量单词对任务的重要性，称为单词的任务分数。然后，根据其任务分数为每个单词分配了掩盖概率。我们尝试根据单词的任务分数分配掩盖概率的不同掩盖功能。 BERT模型进一步培训了MLM目标，其中使用上述策略进行掩盖。遵循此标准监督的微调，用于不同的下游任务。这些任务的结果表明，选择性掩蔽策略的表现优于随机掩盖，表明其有效性。

Pre-training a language model and then fine-tuning it for downstream tasks has demonstrated state-of-the-art results for various NLP tasks. Pre-training is usually independent of the downstream task, and previous works have shown that this pre-training alone might not be sufficient to capture the task-specific nuances. We propose a way to tailor a pre-trained BERT model for the downstream task via task-specific masking before the standard supervised fine-tuning. For this, a word list is first collected specific to the task. For example, if the task is sentiment classification, we collect a small sample of words representing both positive and negative sentiments. Next, a word's importance for the task, called the word's task score, is measured using the word list. Each word is then assigned a probability of masking based on its task score. We experiment with different masking functions that assign the probability of masking based on the word's task score. The BERT model is further trained on MLM objective, where masking is done using the above strategy. Following this standard supervised fine-tuning is done for different downstream tasks. Results on these tasks show that the selective masking strategy outperforms random masking, indicating its effectiveness.

下载PDF全文

下载文献需遵守相关版权规定

论文标题