论文标题
关于低资源语言翻译的最佳变压器深度
On Optimal Transformer Depth for Low-Resource Language Translation
论文作者
论文摘要
变形金刚作为低资源语言的神经机器翻译(NMT)表现出了巨大的希望。但是,与此同时,变压器模型仍然很难优化,并且需要在此设置中仔细调整超参数。许多NMT工具包都带有一组默认的超参数,研究人员和从业人员通常为了便利和避免调整而采用。但是,这些配置已针对大规模的机器翻译数据集进行了优化,并用英语和法语等欧洲语言进行了数百万平行句子。在这项工作中,我们发现该领域使用非常大型模型的当前趋势对低资源语言是有害的,因为它使训练更加困难并损害了整体性能,从而证实了先前的观察结果。我们将我们的工作视为Masakhane项目的补充(“ Masakhane”的意思是“我们一起在Isizulu建立”。在这种精神上,现在,低资源的NMT系统是由最需要它们的社区建造的。但是,社区中的许多人仍然对建立由工业研究推动的极大模型所需的计算资源类型的访问权限非常有限。因此,通过表明变压器模型在低到中度的深度上表现良好(通常是最好的),我们希望说服研究人员将较少的计算资源以及时间投入到这些系统开发过程中探索过大模型的时间。
Transformers have shown great promise as an approach to Neural Machine Translation (NMT) for low-resource languages. However, at the same time, transformer models remain difficult to optimize and require careful tuning of hyper-parameters to be useful in this setting. Many NMT toolkits come with a set of default hyper-parameters, which researchers and practitioners often adopt for the sake of convenience and avoiding tuning. These configurations, however, have been optimized for large-scale machine translation data sets with several millions of parallel sentences for European languages like English and French. In this work, we find that the current trend in the field to use very large models is detrimental for low-resource languages, since it makes training more difficult and hurts overall performance, confirming previous observations. We see our work as complementary to the Masakhane project ("Masakhane" means "We Build Together" in isiZulu.) In this spirit, low-resource NMT systems are now being built by the community who needs them the most. However, many in the community still have very limited access to the type of computational resources required for building extremely large models promoted by industrial research. Therefore, by showing that transformer models perform well (and often best) at low-to-moderate depth, we hope to convince fellow researchers to devote less computational resources, as well as time, to exploring overly large models during the development of these systems.