论文标题

关于序列到序列模型的稀疏编码器输出

On Sparsifying Encoder Outputs in Sequence-to-Sequence Models

论文作者

Zhang, Biao, Titov, Ivan, Sennrich, Rico

论文摘要

序列到序列模型通常将所有编码器输出转移到生成的解码器中。相比之下,在这项工作中,我们假设可以压缩这些编码器输出以缩短用于解码的序列。我们将变压器作为测试床,并在编码器和解码器之间引入一层随机门。使用刺激性诱导L0penalty的预期值正规化门,从而完全掩盖了编码器输出的子集。换句话说,通过联合训练,L0Drop层迫使变压器通过其编码器状态的子集路由信息。我们研究了这种稀疏对两个机器翻译和两个摘要任务的影响。实验表明,根据任务,约有40-70%的源编码可以修剪而不会显着损害质量。输出长度的减小赋予了L0Drop的潜力,具有提高解码效率的潜力,在此,它在针对标准变压器的文档摘要任务上的速度高达1.65倍。我们分析了L0Drop的行为,并观察到它表现出用于修剪某些单词类型的系统偏好,例如功能单词和标点符号最多。受这些观察的启发,我们探讨了指定基于规则的模式的可行性,这些模式可以根据信息(例如词性词性标签,单词频率和单词位置)掩盖编码器输出。

Sequence-to-sequence models usually transfer all encoder outputs to the decoder for generation. In this work, by contrast, we hypothesize that these encoder outputs can be compressed to shorten the sequence delivered for decoding. We take Transformer as the testbed and introduce a layer of stochastic gates in-between the encoder and the decoder. The gates are regularized using the expected value of the sparsity-inducing L0penalty, resulting in completely masking-out a subset of encoder outputs. In other words, via joint training, the L0DROP layer forces Transformer to route information through a subset of its encoder states. We investigate the effects of this sparsification on two machine translation and two summarization tasks. Experiments show that, depending on the task, around 40-70% of source encodings can be pruned without significantly compromising quality. The decrease of the output length endows L0DROP with the potential of improving decoding efficiency, where it yields a speedup of up to 1.65x on document summarization tasks against the standard Transformer. We analyze the L0DROP behaviour and observe that it exhibits systematic preferences for pruning certain word types, e.g., function words and punctuation get pruned most. Inspired by these observations, we explore the feasibility of specifying rule-based patterns that mask out encoder outputs based on information such as part-of-speech tags, word frequency and word position.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源