使用跨域特征和基于各种自动编码器的语音转换中的跨域特征和对抗性学习的无监督表示形式

论文标题

使用跨域特征和基于各种自动编码器的语音转换中的跨域特征和对抗性学习的无监督表示形式

Unsupervised Representation Disentanglement using Cross Domain Features and Adversarial Learning in Variational Autoencoder based Voice Conversion

论文作者

Huang, Wen-Chin, Luo, Hao, Hwang, Hsin-Te, Lo, Chen-Chou, Peng, Yu-Huai, Tsao, Yu, Wang, Hsin-Min

论文摘要

语音转换的有效方法（VC）是将语言内容从语音信号中的其他组件中解散。例如，基于变异的自动编码器（VAE）VC（VAE-VC）的有效性强烈依赖于这一原则。在先前的工作中，我们提出了一个跨域VAE-VC（CDVAE-VC）框架，该框架利用了不同特性的声学特征来提高VAE-VC的性能。我们认为，成功来自更不明显的潜在表示。在本文中，我们通过结合对抗性学习的概念来扩展CDVAE-VC框架，以进一步提高分解程度，从而提高转换语音的质量和相似性。更具体地说，我们首先研究将生成对抗网络（GAN）与CDVAE-VC合并的有效性。然后，我们考虑了域对抗训练的概念，并在由说话者分类器实现的潜在表示中添加了明确的约束，以明确消除驻留在潜在代码中的说话者信息。实验结果证实，GAN和说话者分类器都可以增强学习潜在表示的分离程度。同时，根据质量和相似性得分的主观评估结果证明了我们提出的方法的有效性。

An effective approach for voice conversion (VC) is to disentangle linguistic content from other components in the speech signal. The effectiveness of variational autoencoder (VAE) based VC (VAE-VC), for instance, strongly relies on this principle. In our prior work, we proposed a cross-domain VAE-VC (CDVAE-VC) framework, which utilized acoustic features of different properties, to improve the performance of VAE-VC. We believed that the success came from more disentangled latent representations. In this paper, we extend the CDVAE-VC framework by incorporating the concept of adversarial learning, in order to further increase the degree of disentanglement, thereby improving the quality and similarity of converted speech. More specifically, we first investigate the effectiveness of incorporating the generative adversarial networks (GANs) with CDVAE-VC. Then, we consider the concept of domain adversarial training and add an explicit constraint to the latent representation, realized by a speaker classifier, to explicitly eliminate the speaker information that resides in the latent code. Experimental results confirm that the degree of disentanglement of the learned latent representation can be enhanced by both GANs and the speaker classifier. Meanwhile, subjective evaluation results in terms of quality and similarity scores demonstrate the effectiveness of our proposed methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题