束LPCNET2：有效的神经声码编码器覆盖从云到边缘的设备

论文标题

束LPCNET2：有效的神经声码编码器覆盖从云到边缘的设备

Bunched LPCNet2: Efficient Neural Vocoders Covering Devices from Cloud to Edge

论文作者

Park, Sangjun, Choo, Kihyun, Lee, Joohyung, Porov, Anton V., Osipov, Konstantin, Sung, June Sig

论文摘要

与云TT相比，在Edge设备上运行的文本到语音（TTS）服务具有许多优势，例如延迟和隐私问题。但是，复杂性且小型足迹的神经声码编码器不可避免地会产生烦人的声音。这项研究提出了一个串联的LPCNET2，这是一种改进的LPCNET体系结构，可为云服务器提供高效的高质量性能，以及用于低资源边缘设备的低复杂性。单逻辑分布可实现计算效率，有见地的技巧在保持语音质量的同时减少了模型足迹。还提出了从韵律模型中产生较低采样率的双率体系结构，还提议降低维护成本。该实验表明，捆扎的LPCNET2具有1.1MB的模型足迹，同时在RPI 3B上运行速度快于实时的，可产生令人满意的语音质量。我们的音频样本可在https://srtts.github.io/bunchedlpcnet2上找到。

Text-to-Speech (TTS) services that run on edge devices have many advantages compared to cloud TTS, e.g., latency and privacy issues. However, neural vocoders with a low complexity and small model footprint inevitably generate annoying sounds. This study proposes a Bunched LPCNet2, an improved LPCNet architecture that provides highly efficient performance in high-quality for cloud servers and in a low-complexity for low-resource edge devices. Single logistic distribution achieves computational efficiency, and insightful tricks reduce the model footprint while maintaining speech quality. A DualRate architecture, which generates a lower sampling rate from a prosody model, is also proposed to reduce maintenance costs. The experiments demonstrate that Bunched LPCNet2 generates satisfactory speech quality with a model footprint of 1.1MB while operating faster than real-time on a RPi 3B. Our audio samples are available at https://srtts.github.io/bunchedLPCNet2.

下载PDF全文

下载文献需遵守相关版权规定

论文标题