基于变压器的自动语音识别的自适应稀疏和单调关注

论文标题

基于变压器的自动语音识别的自适应稀疏和单调关注

Adaptive Sparse and Monotonic Attention for Transformer-based Automatic Speech Recognition

论文作者

Zhao, Chendong, Wang, Jianzong, Wei, Wen qi, Qu, Xiaoyang, Wang, Haoqian, Xiao, Jing

论文摘要

基于自我注意力和多头关注的变压器体系结构模型在离线端到端自动语音识别（ASR）方面取得了巨大的成功。但是，自我注意事项和多头注意力不能轻易应用于流媒体或在线ASR。为了在变压器ASR中进行自我注意，基于SoftMax归一化功能的注意机制使得无法强调重要的语音信息。对于变压器ASR中的多头注意力，对不同头部中的单调比对进行建模并不容易。为了克服这两个限制，我们将稀疏的注意力和单调注意力集成到基于变压器的ASR中。稀疏的机制引入了一种学习的稀疏方案，以使每个自我注意结构更适合相应的头部。单调注意力将正则化以修剪多头注意力结构的修剪冗余头部。实验表明，我们的方法可以有效地改善广泛使用语音识别基准的注意力机制。

The Transformer architecture model, based on self-attention and multi-head attention, has achieved remarkable success in offline end-to-end Automatic Speech Recognition (ASR). However, self-attention and multi-head attention cannot be easily applied for streaming or online ASR. For self-attention in Transformer ASR, the softmax normalization function-based attention mechanism makes it impossible to highlight important speech information. For multi-head attention in Transformer ASR, it is not easy to model monotonic alignments in different heads. To overcome these two limits, we integrate sparse attention and monotonic attention into Transformer-based ASR. The sparse mechanism introduces a learned sparsity scheme to enable each self-attention structure to fit the corresponding head better. The monotonic attention deploys regularization to prune redundant heads for the multi-head attention structure. The experiments show that our method can effectively improve the attention mechanism on widely used benchmarks of speech recognition.

下载PDF全文

下载文献需遵守相关版权规定

论文标题