FastPitch：平行文本对语音和音调预测

论文标题

FastPitch：平行文本对语音和音调预测

FastPitch: Parallel Text-to-speech with Pitch Prediction

论文作者

Łańcucki, Adrian

论文摘要

我们提出FastPitch，这是一种基于FastSpeech的完全并行的文本对语音模型，以基本频率轮廓为条件。该模型可以预测推断期间的音高轮廓。通过改变这些预测，产生的语音可以更具表现力，更好地匹配话语的语义，并最终更吸引听众。通过FastPitch统一增加或减小音调会产生类似于声音自愿调制的语音。在频率轮廓上进行调节可以提高综合语音的整体质量，从而与最先进的语音相媲美。它没有引入开销，FastPitch保留了有利的，完全并行的变压器体系结构，具有超过900倍的实时因子，用于典型话语的旋光谱图。

We present FastPitch, a fully-parallel text-to-speech model based on FastSpeech, conditioned on fundamental frequency contours. The model predicts pitch contours during inference. By altering these predictions, the generated speech can be more expressive, better match the semantic of the utterance, and in the end more engaging to the listener. Uniformly increasing or decreasing pitch with FastPitch generates speech that resembles the voluntary modulation of voice. Conditioning on frequency contours improves the overall quality of synthesized speech, making it comparable to state-of-the-art. It does not introduce an overhead, and FastPitch retains the favorable, fully-parallel Transformer architecture, with over 900x real-time factor for mel-spectrogram synthesis of a typical utterance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题