跨语言转移学习中的翻译工件

论文标题

跨语言转移学习中的翻译工件

Translation Artifacts in Cross-lingual Transfer Learning

论文作者

Artetxe, Mikel, Labaka, Gorka, Agirre, Eneko

论文摘要

人和机器翻译在跨语性转移学习中都起着核心作用：许多多语言数据集是通过专业翻译服务创建的，并且使用机器翻译来翻译测试集或训练集是一种广泛使用的转移技术。在本文中，我们表明这种翻译过程可以引入微妙的文物，这些文物在现有的跨语性模型中产生了显着影响。例如，用自然语言推断，独立地翻译前提和假设可以减少它们之间的词汇叠加，而当前模型对此非常敏感。我们表明，根据这种现象，需要重新考虑跨语性转移学习中的一些发现。基于获得的见解，我们还将翻译测试和零击方法的XNLI最新方法分别提高了4.3和2.8分。

Both human and machine translation play a central role in cross-lingual transfer learning: many multilingual datasets have been created through professional translation services, and using machine translation to translate either the test set or the training set is a widely used transfer technique. In this paper, we show that such translation process can introduce subtle artifacts that have a notable impact in existing cross-lingual models. For instance, in natural language inference, translating the premise and the hypothesis independently can reduce the lexical overlap between them, which current models are highly sensitive to. We show that some previous findings in cross-lingual transfer learning need to be reconsidered in the light of this phenomenon. Based on the gained insights, we also improve the state-of-the-art in XNLI for the translate-test and zero-shot approaches by 4.3 and 2.8 points, respectively.

下载PDF全文

下载文献需遵守相关版权规定

论文标题