非自动回旋ASR具有自调整的折叠编码器

论文标题

非自动回旋ASR具有自调整的折叠编码器

Non-Autoregressive ASR with Self-Conditioned Folded Encoders

论文作者

Komatsu, Tatsuya

论文摘要

本文提出了基于CTC的非自动进取的ASR，其自我条件折叠编码器。提出的方法通过将常规编码器的常规堆栈折叠成两个块，从而实现了较少的参数的非自动向上的ASR；基本编码器和折叠编码器。基本编码器将输入音频特征转换为适合识别的神经表示。接下来是重复应用的折叠编码器，以进一步细化。将CTC损失应用于所有编码器的输出，从而强制执行输入输出关系的一致性。因此，折叠的编码器学会执行与具有更深不同层的编码器相同的操作。在实验中，我们研究了如何设置基础和折叠编码器的层数和迭代次数。结果表明，所提出的方法仅使用38％的参数达到了与常规方法相当的性能。此外，在增加迭代次数时，它的表现要优于常规方法。

This paper proposes CTC-based non-autoregressive ASR with self-conditioned folded encoders. The proposed method realizes non-autoregressive ASR with fewer parameters by folding the conventional stack of encoders into only two blocks; base encoders and folded encoders. The base encoders convert the input audio features into a neural representation suitable for recognition. This is followed by the folded encoders applied repeatedly for further refinement. Applying the CTC loss to the outputs of all encoders enforces the consistency of the input-output relationship. Thus, folded encoders learn to perform the same operations as an encoder with deeper distinct layers. In experiments, we investigate how to set the number of layers and the number of iterations for the base and folded encoders. The results show that the proposed method achieves a performance comparable to that of the conventional method using only 38% as many parameters. Furthermore, it outperforms the conventional method when increasing the number of iterations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题