论文标题
基于边界平衡gan的语音转换转换
Speech-to-Singing Conversion based on Boundary Equilibrium GAN
论文作者
论文摘要
本文研究了使用基于生成的对抗网络(GAN)模型将语音信号的频谱转换为唱歌的模型,而无需引用语音基础的音素序列。这是通过将语音转换视为样式转移问题来实现的。具体而言,给定语音输入以及目标唱歌的F0轮廓,所提出的模型将作为输出作为带有逐步增长的编码器/解码器体系结构和边界平衡gan损失函数的输出信号。我们的定量和定性分析表明,所提出的模型比现有的非对抗训练的基线产生的自然性更高。为了重现性,该代码将在纸质出版物后的GitHub存储库中公开可用。
This paper investigates the use of generative adversarial network (GAN)-based models for converting the spectrogram of a speech signal into that of a singing one, without reference to the phoneme sequence underlying the speech. This is achieved by viewing speech-to-singing conversion as a style transfer problem. Specifically, given a speech input, and optionally the F0 contour of the target singing, the proposed model generates as the output a singing signal with a progressive-growing encoder/decoder architecture and boundary equilibrium GAN loss functions. Our quantitative and qualitative analysis show that the proposed model generates singing voices with much higher naturalness than an existing non adversarially-trained baseline. For reproducibility, the code will be publicly available at a GitHub repository upon paper publication.