论文标题

合并低资源半监督神经机器翻译的双语词典

Incorporating Bilingual Dictionaries for Low Resource Semi-Supervised Neural Machine Translation

论文作者

Nag, Sreyashi, Kale, Mihir, Lakshminarasimhan, Varun, Singhavi, Swapnil

论文摘要

我们探索合并双语词典以启用半监督神经机器翻译的方法。常规的反向翻译方法在利用目标侧单语言数据方面取得了成功。但是,由于背面翻译模型的质量与可用并行语料库的大小相关,因此在低资源设置中,这可能会对合成生成的句子产生不利影响。我们提出了一种简单的数据增强技术,以解决这一缺点。我们合并了广泛可用的双语词典,这些词典产生单词的翻译以产生合成句子。这会自动扩展模型的词汇,同时保持高质量的内容。我们的方法显示出对强基础的性能的明显改善。

We explore ways of incorporating bilingual dictionaries to enable semi-supervised neural machine translation. Conventional back-translation methods have shown success in leveraging target side monolingual data. However, since the quality of back-translation models is tied to the size of the available parallel corpora, this could adversely impact the synthetically generated sentences in a low resource setting. We propose a simple data augmentation technique to address both this shortcoming. We incorporate widely available bilingual dictionaries that yield word-by-word translations to generate synthetic sentences. This automatically expands the vocabulary of the model while maintaining high quality content. Our method shows an appreciable improvement in performance over strong baselines.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源