libris2s：德语 - 英语的语音翻译语料库

论文标题

libris2s：德语 - 英语的语音翻译语料库

LibriS2S: A German-English Speech-to-Speech Translation Corpus

论文作者

Jeuris, Pedro, Niehues, Jan

论文摘要

最近，我们看到对语音到文本翻译领域的兴趣越来越大。这导致了这一领域的惊人改善。相比之下，语音到语音翻译领域的活动仍然有限，尽管要克服语言障碍至关重要。我们认为，限制因素之一是适当的培训数据的可用性。据我们所知，我们通过创建Libris2S来解决这个问题，这是德语和英语之间的第一个公开语音到语音培训语料库。对于此语料库，我们使用独立的德语和英语创建音频，导致两种语言的文本发音无偏见。这允许创建一个新的文本到语音和语音转换模型，该模型直接学会根据源语言的发音生成语音信号。使用此创建的语料库，我们根据最近提出的FastSpeech 2模型的示例提出了文本到语音模型，该模型集成了源语言信息。我们通过调整模型以将诸如源语音的音调，能量或转录本等信息作为附加输入来做到这一点。

Recently, we have seen an increasing interest in the area of speech-to-text translation. This has led to astonishing improvements in this area. In contrast, the activities in the area of speech-to-speech translation is still limited, although it is essential to overcome the language barrier. We believe that one of the limiting factors is the availability of appropriate training data. We address this issue by creating LibriS2S, to our knowledge the first publicly available speech-to-speech training corpus between German and English. For this corpus, we used independently created audio for German and English leading to an unbiased pronunciation of the text in both languages. This allows the creation of a new text-to-speech and speech-to-speech translation model that directly learns to generate the speech signal based on the pronunciation of the source language. Using this created corpus, we propose Text-to-Speech models based on the example of the recently proposed FastSpeech 2 model that integrates source language information. We do this by adapting the model to take information such as the pitch, energy or transcript from the source speech as additional input.

下载PDF全文

下载文献需遵守相关版权规定

论文标题