通过ASR的离散语音表示，无监督的数据选择

论文标题

通过ASR的离散语音表示，无监督的数据选择

Unsupervised Data Selection via Discrete Speech Representation for ASR

论文作者

Lu, Zhiyun, Wang, Yongqiang, Zhang, Yu, Han, Wei, Chen, Zhehuai, Haghani, Parisa

论文摘要

对语音表征的自我监督学习在改善自动语音识别（ASR）方面取得了令人印象深刻的结果。在本文中，我们表明数据选择对于自学学习至关重要。我们提出了一种简单有效的无监督数据选择方法，该方法选择了与目标域相似的语音。它将共同自我监督学习框架中可用的离散语音表示为输入，并将对比度选择方法应用于离散令牌。通过广泛的经验研究，我们表明我们提出的方法减少了所需的预训练数据的数量，并改善了下游的ASR性能。与完整集合的预训练相比，对选定子集的6％的选定子集的预训练可在Librispeech测试中获得11.8％的相对改善。在多语言的Librispeech法语，德语和西班牙测试集上，选择6％的数据进行预培训可将单词错误率降低超过15％以上，而与当前的最先进的表演相比，获得竞争性的结果。

Self-supervised learning of speech representations has achieved impressive results in improving automatic speech recognition (ASR). In this paper, we show that data selection is important for self-supervised learning. We propose a simple and effective unsupervised data selection method which selects acoustically similar speech to a target domain. It takes the discrete speech representation available in common self-supervised learning frameworks as input, and applies a contrastive data selection method on the discrete tokens. Through extensive empirical studies we show that our proposed method reduces the amount of required pre-training data and improves the downstream ASR performance. Pre-training on a selected subset of 6% of the general data pool results in 11.8% relative improvements in LibriSpeech test-other compared to pre-training on the full set. On Multilingual LibriSpeech French, German, and Spanish test sets, selecting 6% data for pre-training reduces word error rate by more than 15% relatively compared to the full set, and achieves competitive results compared to current state-of-the-art performances.

下载PDF全文

下载文献需遵守相关版权规定

论文标题