非正式语言理解任务的数据退火

论文标题

非正式语言理解任务的数据退火

Data Annealing for Informal Language Understanding Tasks

论文作者

Gu, Jing, Yu, Zhou

论文摘要

正式和非正式的语言理解任务之间存在巨大的性能差距。改善正式语言理解任务的最新预培训模型并没有在非正式语言上取得可比的结果。我们为数据退火转移学习程序提供了一个数据，以弥合非正式自然语言理解任务的绩效差距。它成功地利用了预先训练的模型，例如BERT非正式语言。在我们的数据退火程序中，培训集首先包含正式的文本数据。然后，在培训过程中，非正式文本数据的比例逐渐增加。我们的数据退火过程是与模型无关的，可以应用于各种任务。我们在详尽的实验中验证其有效性。当通过我们的学习过程实施BERT时，它的表现要优于三个常见的非正式语言任务上的所有最新模型。

There is a huge performance gap between formal and informal language understanding tasks. The recent pre-trained models that improved the performance of formal language understanding tasks did not achieve a comparable result on informal language. We pro-pose a data annealing transfer learning procedure to bridge the performance gap on informal natural language understanding tasks. It successfully utilizes a pre-trained model such as BERT in informal language. In our data annealing procedure, the training set contains mainly formal text data at first; then, the proportion of the informal text data is gradually increased during the training process. Our data annealing procedure is model-independent and can be applied to various tasks. We validate its effectiveness in exhaustive experiments. When BERT is implemented with our learning procedure, it outperforms all the state-of-the-art models on the three common informal language tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题