论文标题
准周期性平行波gan:一种非自动回归的原始波形生成模型,依赖螺距依赖的卷积神经网络
Quasi-Periodic Parallel WaveGAN: A Non-autoregressive Raw Waveform Generative Model with Pitch-dependent Dilated Convolution Neural Network
论文作者
论文摘要
在本文中,我们建议使用依赖俯仰依赖性的卷积网络(PDCNN)将准周期性波形生成模型(QPWG)波形生成模型。 PWG是一种基于GAN的小型原始波形生成模型,其产生时间比实时时间快得多,因为其紧凑型模型和非自动回旋(非AR)和非作业机制。尽管PWG达到了高保真性的语音生成,但通用和简单的网络体系结构缺乏看不见的辅助基本频率($ f_ {0} $)功能(例如缩放$ f_ {0} $)的俯仰可控性。为了提高音高可控性和语音建模能力,我们将带有PDCNNS的QP结构应用于PWG,该结构通过动态更改与辅助$ f_ {0} $功能的网络体系结构来向网络中引入音调信息。客观和主观实验结果都表明,当缩放辅助$ f_ {0} $特征时,QPWG优于PWG。此外,对QPPWG的中间输出的分析还显示出QPPWG的更好的障碍性和解释性,该QPWG分别使用QP结构的固定固定和自适应块对光谱和激发信号进行建模。
In this paper, we propose a quasi-periodic parallel WaveGAN (QPPWG) waveform generative model, which applies a quasi-periodic (QP) structure to a parallel WaveGAN (PWG) model using pitch-dependent dilated convolution networks (PDCNNs). PWG is a small-footprint GAN-based raw waveform generative model, whose generation time is much faster than real time because of its compact model and non-autoregressive (non-AR) and non-causal mechanisms. Although PWG achieves high-fidelity speech generation, the generic and simple network architecture lacks pitch controllability for an unseen auxiliary fundamental frequency ($F_{0}$) feature such as a scaled $F_{0}$. To improve the pitch controllability and speech modeling capability, we apply a QP structure with PDCNNs to PWG, which introduces pitch information to the network by dynamically changing the network architecture corresponding to the auxiliary $F_{0}$ feature. Both objective and subjective experimental results show that QPPWG outperforms PWG when the auxiliary $F_{0}$ feature is scaled. Moreover, analyses of the intermediate outputs of QPPWG also show better tractability and interpretability of QPPWG, which respectively models spectral and excitation-like signals using the cascaded fixed and adaptive blocks of the QP structure.