论文标题

通过ASR的离散语音表示,无监督的数据选择

Unsupervised Data Selection via Discrete Speech Representation for ASR

论文作者

Lu, Zhiyun, Wang, Yongqiang, Zhang, Yu, Han, Wei, Chen, Zhehuai, Haghani, Parisa

论文摘要

对语音表征的自我监督学习在改善自动语音识别(ASR)方面取得了令人印象深刻的结果。在本文中,我们表明数据选择对于自学学习至关重要。我们提出了一种简单有效的无监督数据选择方法,该方法选择了与目标域相似的语音。它将共同自我监督学习框架中可用的离散语音表示为输入,并将对比度选择方法应用于离散令牌。通过广泛的经验研究,我们表明我们提出的方法减少了所需的预训练数据的数量,并改善了下游的ASR性能。与完整集合的预训练相比,对选定子集的6%的选定子集的预训练可在Librispeech测试中获得11.8%的相对改善。在多语言的Librispeech法语,德语和西班牙测试集上,选择6%的数据进行预培训可将单词错误率降低超过15%以上,而与当前的最先进的表演相比,获得竞争性的结果。

Self-supervised learning of speech representations has achieved impressive results in improving automatic speech recognition (ASR). In this paper, we show that data selection is important for self-supervised learning. We propose a simple and effective unsupervised data selection method which selects acoustically similar speech to a target domain. It takes the discrete speech representation available in common self-supervised learning frameworks as input, and applies a contrastive data selection method on the discrete tokens. Through extensive empirical studies we show that our proposed method reduces the amount of required pre-training data and improves the downstream ASR performance. Pre-training on a selected subset of 6% of the general data pool results in 11.8% relative improvements in LibriSpeech test-other compared to pre-training on the full set. On Multilingual LibriSpeech French, German, and Spanish test sets, selecting 6% data for pre-training reduces word error rate by more than 15% relatively compared to the full set, and achieves competitive results compared to current state-of-the-art performances.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源