论文标题
合并低资源半监督神经机器翻译的双语词典
Incorporating Bilingual Dictionaries for Low Resource Semi-Supervised Neural Machine Translation
论文作者
论文摘要
我们探索合并双语词典以启用半监督神经机器翻译的方法。常规的反向翻译方法在利用目标侧单语言数据方面取得了成功。但是,由于背面翻译模型的质量与可用并行语料库的大小相关,因此在低资源设置中,这可能会对合成生成的句子产生不利影响。我们提出了一种简单的数据增强技术,以解决这一缺点。我们合并了广泛可用的双语词典,这些词典产生单词的翻译以产生合成句子。这会自动扩展模型的词汇,同时保持高质量的内容。我们的方法显示出对强基础的性能的明显改善。
We explore ways of incorporating bilingual dictionaries to enable semi-supervised neural machine translation. Conventional back-translation methods have shown success in leveraging target side monolingual data. However, since the quality of back-translation models is tied to the size of the available parallel corpora, this could adversely impact the synthetically generated sentences in a low resource setting. We propose a simple data augmentation technique to address both this shortcoming. We incorporate widely available bilingual dictionaries that yield word-by-word translations to generate synthetic sentences. This automatically expands the vocabulary of the model while maintaining high quality content. Our method shows an appreciable improvement in performance over strong baselines.