无监督的音素细分的自我监督对比度学习

论文标题

无监督的音素细分的自我监督对比度学习

Self-Supervised Contrastive Learning for Unsupervised Phoneme Segmentation

论文作者

Kreuk, Felix, Keshet, Joseph, Adi, Yossi

论文摘要

我们为无监督的音素边界检测任务提出了一个自我监督的表示学习模型。该模型是直接在原始波形上运行的卷积神经网络。它被优化以使用噪声对抗性估计原理来识别信号中的光谱变化。在测试时，在模型输出上应用了峰检测算法以产生最终边界。因此，提出的模型以完全无监督的方式进行训练，没有目标边界或语音转录的形式的手动注释。我们将拟议方法与使用Timit和Buckeye Corpora的几个无监督的基线进行比较。结果表明，我们的方法超过了基线模型，并且在两个数据集上都达到了最先进的性能。此外，我们尝试通过Librispeech语料库的其他示例扩展训练集。我们评估了在培训阶段（英语，希伯来语和德语）中未见的分布和语言的结果模型，并表明利用其他未转录的数据对模型性能有益。

We propose a self-supervised representation learning model for the task of unsupervised phoneme boundary detection. The model is a convolutional neural network that operates directly on the raw waveform. It is optimized to identify spectral changes in the signal using the Noise-Contrastive Estimation principle. At test time, a peak detection algorithm is applied over the model outputs to produce the final boundaries. As such, the proposed model is trained in a fully unsupervised manner with no manual annotations in the form of target boundaries nor phonetic transcriptions. We compare the proposed approach to several unsupervised baselines using both TIMIT and Buckeye corpora. Results suggest that our approach surpasses the baseline models and reaches state-of-the-art performance on both data sets. Furthermore, we experimented with expanding the training set with additional examples from the Librispeech corpus. We evaluated the resulting model on distributions and languages that were not seen during the training phase (English, Hebrew and German) and showed that utilizing additional untranscribed data is beneficial for model performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题