学习语音的普遍非语义代表

论文标题

学习语音的普遍非语义代表

Towards Learning a Universal Non-Semantic Representation of Speech

论文作者

Shor, Joel, Jansen, Aren, Maor, Ronnie, Lang, Oran, Tuval, Omry, Quitry, Felix de Chaumont, Tagliasacchi, Marco, Shavitt, Ira, Emanuel, Dotan, Haviv, Yinnon

论文摘要

转移学习的最终目标是通过利用针对不同数据集或任务培训的先前存在的嵌入模型来减少标记的数据需求。视觉和语言社区已经建立了基准来比较嵌入，但言语社区尚未这样做。本文提出了一个基准，用于比较非语义任务上的语音表示形式，并提出了基于无监督的三胞胎损失目标的代表。所提出的表示形式在基准上优于其他表示，甚至超过了许多转移学习任务的最先进绩效。该嵌入在公开可用的数据集上进行了培训，并在各种低资源下游任务（包括个性化任务和医疗领域）上进行了测试。公开发布基准，模型和评估代码。

The ultimate goal of transfer learning is to reduce labeled data requirements by exploiting a pre-existing embedding model trained for different datasets or tasks. The visual and language communities have established benchmarks to compare embeddings, but the speech community has yet to do so. This paper proposes a benchmark for comparing speech representations on non-semantic tasks, and proposes a representation based on an unsupervised triplet-loss objective. The proposed representation outperforms other representations on the benchmark, and even exceeds state-of-the-art performance on a number of transfer learning tasks. The embedding is trained on a publicly available dataset, and it is tested on a variety of low-resource downstream tasks, including personalization tasks and medical domain. The benchmark, models, and evaluation code are publicly released.

下载PDF全文

下载文献需遵守相关版权规定

论文标题