论文标题
Squeezewave:在设备演讲综合的极轻量
SqueezeWave: Extremely Lightweight Vocoders for On-device Speech Synthesis
论文作者
论文摘要
自动语音综合是一项具有挑战性的任务,随着边缘设备开始通过语音与用户互动,它变得越来越重要。典型的文本到语音管道包括一个Vocoder,将中间音频表示形式转化为音频波形。由于每个生成的样品都在以前的样本上进行条件,因此大多数现有的声音编码器都难以并行化。 WaveGlow是这些自动回归模型的基于流动的前馈替代方案(Prenger等,2019)。但是,虽然波格曲线很容易平行,但该模型对于边缘上的实时语音综合太昂贵。本文介绍了Squeezewave,这是一个基于Wavellow的轻量级歌手家族,它可以用61x -214倍的MAC产生与Wavellow相似的音频。代码,训练有素的模型和生成的音频可在https://github.com/tianrengao/squeezewave上公开获得。
Automatic speech synthesis is a challenging task that is becoming increasingly important as edge devices begin to interact with users through speech. Typical text-to-speech pipelines include a vocoder, which translates intermediate audio representations into an audio waveform. Most existing vocoders are difficult to parallelize since each generated sample is conditioned on previous samples. WaveGlow is a flow-based feed-forward alternative to these auto-regressive models (Prenger et al., 2019). However, while WaveGlow can be easily parallelized, the model is too expensive for real-time speech synthesis on the edge. This paper presents SqueezeWave, a family of lightweight vocoders based on WaveGlow that can generate audio of similar quality to WaveGlow with 61x - 214x fewer MACs. Code, trained models, and generated audio are publicly available at https://github.com/tianrengao/SqueezeWave.