论文标题

通过端到端的自我训练半监督ASR

Semi-supervised ASR by End-to-end Self-training

论文作者

Chen, Yang, Wang, Weiran, Wang, Chao

论文摘要

尽管基于深度学习的端到端自动语音识别(ASR)系统具有极大的简化建模管道,但它们遭受了数据稀疏问题的困扰。在这项工作中,我们提出了一种自我训练方法,该方法具有半监督ASR的端到端系统。从经过监督数据训练的连接主义时间分类(CTC)系统开始,我们迭代地在迷你批量的无监督语言上使用当前模型生成了伪标记,并使用伪标签来增强监督数据以立即进行模型更新。我们的方法保留了端到端ASR系统的简单性,可以看作是对明确定义的学习目标进行交替优化。我们还对我们的方法进行了实证研究,有关数据增强的影响,对伪标签生成的束束和伪标签的新鲜度。在使用WSJ语料库的常用的半监督ASR设置上,我们的方法对经过培训的基本系统进行了14.4%的相对改善,并通过数据增强,将基本系统和Oracle系统之间的性能差距降低了50%。

While deep learning based end-to-end automatic speech recognition (ASR) systems have greatly simplified modeling pipelines, they suffer from the data sparsity issue. In this work, we propose a self-training method with an end-to-end system for semi-supervised ASR. Starting from a Connectionist Temporal Classification (CTC) system trained on the supervised data, we iteratively generate pseudo-labels on a mini-batch of unsupervised utterances with the current model, and use the pseudo-labels to augment the supervised data for immediate model update. Our method retains the simplicity of end-to-end ASR systems, and can be seen as performing alternating optimization over a well-defined learning objective. We also perform empirical investigations of our method, regarding the effect of data augmentation, decoding beamsize for pseudo-label generation, and freshness of pseudo-labels. On a commonly used semi-supervised ASR setting with the WSJ corpus, our method gives 14.4% relative WER improvement over a carefully-trained base system with data augmentation, reducing the performance gap between the base system and the oracle system by 50%.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源