基于边界平衡gan的语音转换转换

论文标题

基于边界平衡gan的语音转换转换

Speech-to-Singing Conversion based on Boundary Equilibrium GAN

论文作者

Wu, Da-Yi, Yang, Yi-Hsuan

论文摘要

本文研究了使用基于生成的对抗网络（GAN）模型将语音信号的频谱转换为唱歌的模型，而无需引用语音基础的音素序列。这是通过将语音转换视为样式转移问题来实现的。具体而言，给定语音输入以及目标唱歌的F0轮廓，所提出的模型将作为输出作为带有逐步增长的编码器/解码器体系结构和边界平衡gan损失函数的输出信号。我们的定量和定性分析表明，所提出的模型比现有的非对抗训练的基线产生的自然性更高。为了重现性，该代码将在纸质出版物后的GitHub存储库中公开可用。

This paper investigates the use of generative adversarial network (GAN)-based models for converting the spectrogram of a speech signal into that of a singing one, without reference to the phoneme sequence underlying the speech. This is achieved by viewing speech-to-singing conversion as a style transfer problem. Specifically, given a speech input, and optionally the F0 contour of the target singing, the proposed model generates as the output a singing signal with a progressive-growing encoder/decoder architecture and boundary equilibrium GAN loss functions. Our quantitative and qualitative analysis show that the proposed model generates singing voices with much higher naturalness than an existing non adversarially-trained baseline. For reproducibility, the code will be publicly available at a GitHub repository upon paper publication.

下载PDF全文

下载文献需遵守相关版权规定

论文标题