论文标题
Arzen-ST:用于代码开关的埃及阿拉伯语的三向语音翻译语料库 - 英语
ArzEn-ST: A Three-way Speech Translation Corpus for Code-Switched Egyptian Arabic - English
论文作者
论文摘要
我们介绍了收集Arzen-St的工作,Arzen-ST是一项代码开关的埃及阿拉伯语 - 英语语音翻译语料库。该语料库是Arzen演讲语料库的延伸,该语料库是通过对双语演讲者的非正式访谈而收集的。在这项工作中,我们在两个方向上收集翻译,单语埃及阿拉伯语和单语言英语,形成了三向语音翻译语料库。我们公开提供翻译指南和语料库。我们还报告了用于机器翻译和语音翻译任务的基线系统的结果。我们认为,这是一种有价值的资源,可以从语言角度来激励和促进进一步的研究研究代码开关现象,并可用于训练和评估NLP系统。
We present our work on collecting ArzEn-ST, a code-switched Egyptian Arabic - English Speech Translation Corpus. This corpus is an extension of the ArzEn speech corpus, which was collected through informal interviews with bilingual speakers. In this work, we collect translations in both directions, monolingual Egyptian Arabic and monolingual English, forming a three-way speech translation corpus. We make the translation guidelines and corpus publicly available. We also report results for baseline systems for machine translation and speech translation tasks. We believe this is a valuable resource that can motivate and facilitate further research studying the code-switching phenomenon from a linguistic perspective and can be used to train and evaluate NLP systems.