狙击手训练：单次稀疏培训文本到语音

论文标题

狙击手训练：单次稀疏培训文本到语音

SNIPER Training: Single-Shot Sparse Training for Text-to-Speech

论文作者

Lam, Perry, Zhang, Huayun, Chen, Nancy F., Sisman, Berrak, Herremans, Dorien

论文摘要

近年来，文本到语音（TTS）模型已经实现了显着的自然性，但是像大多数深神经模型一样，它们的参数超出了必要的参数。稀疏的TTS模型可以通过修剪和额外的再培训来改善密集模型，或者比密集模型更快地收敛，并具有一定的性能损失。因此，我们提出了使用衰减稀疏性的训练TTS模型，即首先要加速训练的高初始稀疏性，然后进行降低降低速度以获得更好的最终性能。这种降低方法与当前将稀疏性的方法不同，该方法的成本比密集训练的时间明显高得多。我们称我们的方法狙击手训练：单发初始化修剪不断发展的利率培训。我们在FastSpeech2上进行的实验表明，我们能够在最初的几个训练时期使用狙击手获得更好的损失，并且最终狙击手训练的模型的表现优于常数 - 表格模型，并逐渐消除了密集的模型，并且训练时间差异可差异。

Text-to-speech (TTS) models have achieved remarkable naturalness in recent years, yet like most deep neural models, they have more parameters than necessary. Sparse TTS models can improve on dense models via pruning and extra retraining, or converge faster than dense models with some performance loss. Thus, we propose training TTS models using decaying sparsity, i.e. a high initial sparsity to accelerate training first, followed by a progressive rate reduction to obtain better eventual performance. This decremental approach differs from current methods of incrementing sparsity to a desired target, which costs significantly more time than dense training. We call our method SNIPER training: Single-shot Initialization Pruning Evolving-Rate training. Our experiments on FastSpeech2 show that we were able to obtain better losses in the first few training epochs with SNIPER, and that the final SNIPER-trained models outperformed constant-sparsity models and edged out dense models, with negligible difference in training time.

下载PDF全文

下载文献需遵守相关版权规定

论文标题