通过非平行训练数据唱歌的语音转换的VAW-GAN

论文标题

通过非平行训练数据唱歌的语音转换的VAW-GAN

VAW-GAN for Singing Voice Conversion with Non-parallel Training Data

论文作者

Lu, Junchen, Zhou, Kun, Sisman, Berrak, Li, Haizhou

论文摘要

唱歌语音转换旨在将Singer的声音从源转换为目标，而无需更改唱歌内容。训练语音转换系统通常需要并行培训数据，但是在现实生活中，这是不实用的。最新的编码器 - 模块结构，例如变异自动编码Wasserstein生成对抗网络（VAW-GAN），为通过非平行训练数据提供了学习映射的有效方法。在本文中，我们提出了一个基于Vaw-gan的歌声转换框架。我们训练一个编码器，以从语音内容中解散歌手身份和唱歌韵律（F0轮廓）。通过对歌手身份和F0进行调节，解码器生成具有看不见的目标歌手身份的输出频谱特征，并改善了F0渲染。实验结果表明，所提出的框架的性能比基线框架更好。

Singing voice conversion aims to convert singer's voice from source to target without changing singing content. Parallel training data is typically required for the training of singing voice conversion system, that is however not practical in real-life applications. Recent encoder-decoder structures, such as variational autoencoding Wasserstein generative adversarial network (VAW-GAN), provide an effective way to learn a mapping through non-parallel training data. In this paper, we propose a singing voice conversion framework that is based on VAW-GAN. We train an encoder to disentangle singer identity and singing prosody (F0 contour) from phonetic content. By conditioning on singer identity and F0, the decoder generates output spectral features with unseen target singer identity, and improves the F0 rendering. Experimental results show that the proposed framework achieves better performance than the baseline frameworks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题