通过端到端的自我训练半监督ASR

论文标题

通过端到端的自我训练半监督ASR

Semi-supervised ASR by End-to-end Self-training

论文作者

Chen, Yang, Wang, Weiran, Wang, Chao

论文摘要

尽管基于深度学习的端到端自动语音识别（ASR）系统具有极大的简化建模管道，但它们遭受了数据稀疏问题的困扰。在这项工作中，我们提出了一种自我训练方法，该方法具有半监督ASR的端到端系统。从经过监督数据训练的连接主义时间分类（CTC）系统开始，我们迭代地在迷你批量的无监督语言上使用当前模型生成了伪标记，并使用伪标签来增强监督数据以立即进行模型更新。我们的方法保留了端到端ASR系统的简单性，可以看作是对明确定义的学习目标进行交替优化。我们还对我们的方法进行了实证研究，有关数据增强的影响，对伪标签生成的束束和伪标签的新鲜度。在使用WSJ语料库的常用的半监督ASR设置上，我们的方法对经过培训的基本系统进行了14.4％的相对改善，并通过数据增强，将基本系统和Oracle系统之间的性能差距降低了50％。

While deep learning based end-to-end automatic speech recognition (ASR) systems have greatly simplified modeling pipelines, they suffer from the data sparsity issue. In this work, we propose a self-training method with an end-to-end system for semi-supervised ASR. Starting from a Connectionist Temporal Classification (CTC) system trained on the supervised data, we iteratively generate pseudo-labels on a mini-batch of unsupervised utterances with the current model, and use the pseudo-labels to augment the supervised data for immediate model update. Our method retains the simplicity of end-to-end ASR systems, and can be seen as performing alternating optimization over a well-defined learning objective. We also perform empirical investigations of our method, regarding the effect of data augmentation, decoding beamsize for pseudo-label generation, and freshness of pseudo-labels. On a commonly used semi-supervised ASR setting with the WSJ corpus, our method gives 14.4% relative WER improvement over a carefully-trained base system with data augmentation, reducing the performance gap between the base system and the oracle system by 50%.

下载PDF全文

下载文献需遵守相关版权规定

论文标题