学习准确的整数变压器机器翻译模型

论文标题

学习准确的整数变压器机器翻译模型

Learning Accurate Integer Transformer Machine-Translation Models

论文作者

Wu, Ephrem

论文摘要

我们描述了一种训练精确变压器机器翻译模型的方法，可以使用8位整数（INT8）硬件矩阵乘数进行推理，而不是成本更高的单精度浮点（FP32）硬件。与以前的工作不同，该工作仅将85个变压器矩阵乘法转换为INT8，在FP32中的133个中，有48个由于精确的损失而剩下的133个，我们将它们全部转换为INT8而不会损害精度。在NewStest2014英语到德语翻译任务上测试，我们的INT8变压器基础和变压器大型模型相对于相应的FP32模型的BLEU分数为99.3％至100％。我们的方法通过在训练过程中自动进行范围准确的权衡，将所有矩阵式张紧器从现有的FP32模型转换为INT8张量。为了证明这种方法的鲁棒性，我们还包括INT6变压器模型的结果。

We describe a method for training accurate Transformer machine-translation models to run inference using 8-bit integer (INT8) hardware matrix multipliers, as opposed to the more costly single-precision floating-point (FP32) hardware. Unlike previous work, which converted only 85 Transformer matrix multiplications to INT8, leaving 48 out of 133 of them in FP32 because of unacceptable accuracy loss, we convert them all to INT8 without compromising accuracy. Tested on the newstest2014 English-to-German translation task, our INT8 Transformer Base and Transformer Big models yield BLEU scores that are 99.3% to 100% relative to those of the corresponding FP32 models. Our approach converts all matrix-multiplication tensors from an existing FP32 model into INT8 tensors by automatically making range-precision trade-offs during training. To demonstrate the robustness of this approach, we also include results from INT6 Transformer models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题