减少端到端代码转换自动语音识别的语言上下文混乱

论文标题

减少端到端代码转换自动语音识别的语言上下文混乱

Reducing language context confusion for end-to-end code-switching automatic speech recognition

论文作者

Zhang, Shuai, Yi, Jiangyan, Tian, Zhengkun, Tao, Jianhua, Yeung, Yu Ting, Deng, Liqun

论文摘要

代码转换在通信过程中处理替代语言。训练端到端（E2E）自动语音识别（ASR）系统用于代码开关尤其具有挑战性，因为由于存在多种语言，代码切换培训数据始终不足以应对增加的多语言上下文混乱。我们提出了一种与语言相关的注意机制，以减少基于等价约束（EC）理论的E2E代码转换ASR模型的多语言上下文混乱。语言理论要求在代码转换句子中发生的任何单语片段都必须发生在一个单语句子中。该理论在单语言数据和代码转换数据之间建立了桥梁。我们利用这种语言学理论设计代码开关E2E ASR模型。所提出的模型有效地将语言知识从丰富的单语言数据转移到改善代码转换ASR模型的性能。我们在ASRU 2019-English代码转换挑战数据集上评估了我们的模型。与基线模型相比，我们提出的模型可实现17.12％的相对误差降低。

Code-switching deals with alternative languages in communication process. Training end-to-end (E2E) automatic speech recognition (ASR) systems for code-switching is especially challenging as code-switching training data are always insufficient to combat the increased multilingual context confusion due to the presence of more than one language. We propose a language-related attention mechanism to reduce multilingual context confusion for the E2E code-switching ASR model based on the Equivalence Constraint (EC) Theory. The linguistic theory requires that any monolingual fragment that occurs in the code-switching sentence must occur in one of the monolingual sentences. The theory establishes a bridge between monolingual data and code-switching data. We leverage this linguistics theory to design the code-switching E2E ASR model. The proposed model efficiently transfers language knowledge from rich monolingual data to improve the performance of the code-switching ASR model. We evaluate our model on ASRU 2019 Mandarin-English code-switching challenge dataset. Compared to the baseline model, our proposed model achieves a 17.12% relative error reduction.

下载PDF全文

下载文献需遵守相关版权规定

论文标题