Squeezewave：在设备演讲综合的极轻量

论文标题

Squeezewave：在设备演讲综合的极轻量

SqueezeWave: Extremely Lightweight Vocoders for On-device Speech Synthesis

论文作者

Zhai, Bohan, Gao, Tianren, Xue, Flora, Rothchild, Daniel, Wu, Bichen, Gonzalez, Joseph E., Keutzer, Kurt

论文摘要

自动语音综合是一项具有挑战性的任务，随着边缘设备开始通过语音与用户互动，它变得越来越重要。典型的文本到语音管道包括一个Vocoder，将中间音频表示形式转化为音频波形。由于每个生成的样品都在以前的样本上进行条件，因此大多数现有的声音编码器都难以并行化。 WaveGlow是这些自动回归模型的基于流动的前馈替代方案（Prenger等，2019）。但是，虽然波格曲线很容易平行，但该模型对于边缘上的实时语音综合太昂贵。本文介绍了Squeezewave，这是一个基于Wavellow的轻量级歌手家族，它可以用61x -214倍的MAC产生与Wavellow相似的音频。代码，训练有素的模型和生成的音频可在https://github.com/tianrengao/squeezewave上公开获得。

Automatic speech synthesis is a challenging task that is becoming increasingly important as edge devices begin to interact with users through speech. Typical text-to-speech pipelines include a vocoder, which translates intermediate audio representations into an audio waveform. Most existing vocoders are difficult to parallelize since each generated sample is conditioned on previous samples. WaveGlow is a flow-based feed-forward alternative to these auto-regressive models (Prenger et al., 2019). However, while WaveGlow can be easily parallelized, the model is too expensive for real-time speech synthesis on the edge. This paper presents SqueezeWave, a family of lightweight vocoders based on WaveGlow that can generate audio of similar quality to WaveGlow with 61x - 214x fewer MACs. Code, trained models, and generated audio are publicly available at https://github.com/tianrengao/SqueezeWave.

下载PDF全文

下载文献需遵守相关版权规定

论文标题