论文标题
半自动回归培训改善了面具预测解码
Semi-Autoregressive Training Improves Mask-Predict Decoding
论文作者
论文摘要
最近提出的掩模预测解码算法缩小了半自动回调机器翻译模型与传统的左右方法之间的性能差距。我们引入了一种针对有条件掩盖语言模型的新培训方法Smart,该方法模仿了面具预测的半自动回调行为,并制作了培训示例,这些培训示例包含模型预测作为其投入的一部分。使用蒙版预测解码时,经过智能产生更高质量的翻译训练的型号,通过完全自动回归模型有效地缩小了剩余的性能差距。
The recently proposed mask-predict decoding algorithm has narrowed the performance gap between semi-autoregressive machine translation models and the traditional left-to-right approach. We introduce a new training method for conditional masked language models, SMART, which mimics the semi-autoregressive behavior of mask-predict, producing training examples that contain model predictions as part of their inputs. Models trained with SMART produce higher-quality translations when using mask-predict decoding, effectively closing the remaining performance gap with fully autoregressive models.