无监督声嵌入的通讯变异自动编码器

论文标题

无监督声嵌入的通讯变异自动编码器

A Correspondence Variational Autoencoder for Unsupervised Acoustic Word Embeddings

论文作者

Peng, Puyuan, Kamper, Herman, Livescu, Karen

论文摘要

我们提出了一个新的无监督模型，用于将可变的语音段映射到固定维表示。由此产生的声词嵌入可以构成低资源和零资源语言的搜索，发现和索引系统的基础。我们的模型我们称为最大采样对应变异自动编码器（MCVAE），是一个经常性的神经网络（RNN），受过新型的自我监督对应损失训练，可以鼓励同一单词不同实例的嵌入之间的一致性。我们的训练计划通过使用和比较近似后部分布的多个样本来改善以前的对应培训方法。在零资源的设置中，可以通过使用通过无监督的术语发现系统发现的单词式段来以不受监督的方式训练MCVAE，而无需任何地面词对。在此环境和半监督的低资源设置（具有有限的地面词对）中，MCVAE的表现优于先前的最先进的模型，例如基于暹罗，Cae-和vae的RNN。

We propose a new unsupervised model for mapping a variable-duration speech segment to a fixed-dimensional representation. The resulting acoustic word embeddings can form the basis of search, discovery, and indexing systems for low- and zero-resource languages. Our model, which we refer to as a maximal sampling correspondence variational autoencoder (MCVAE), is a recurrent neural network (RNN) trained with a novel self-supervised correspondence loss that encourages consistency between embeddings of different instances of the same word. Our training scheme improves on previous correspondence training approaches through the use and comparison of multiple samples from the approximate posterior distribution. In the zero-resource setting, the MCVAE can be trained in an unsupervised way, without any ground-truth word pairs, by using the word-like segments discovered via an unsupervised term discovery system. In both this setting and a semi-supervised low-resource setting (with a limited set of ground-truth word pairs), the MCVAE outperforms previous state-of-the-art models, such as Siamese-, CAE- and VAE-based RNNs.

下载PDF全文

下载文献需遵守相关版权规定

论文标题