论文标题

语音辅助的多目标单元建模,用于改善构象 - 变形器ASR系统

Phonetic-assisted Multi-Target Units Modeling for Improving Conformer-Transducer ASR system

论文作者

Li, Li, Xu, Dongxing, Wei, Haoran, Long, Yanhua

论文摘要

利用有效的目标建模单元非常重要,并且一直是端到端自动语音识别(ASR)的关注点。在这项工作中,我们提出了一种语音辅助多目标单元(PMU)建模方法,以逐步表示学习方式增强构象-Transducer ASR系统。具体而言,PMU首先使用发音辅助子字建模(PASM)和字节对编码(BPE)分别产生语音诱导和文本诱导的目标单元。然后,研究了三个新框架,以增强声学编码器,包括基本的PMU,paractc和pcactc,它们将PASM和BPE单元集成在不同级别的CTC和传感器多任务训练。对LibrisPeech和重音ASR任务的实验表明,提出的PMU显着超过了常规的BPE,它将LibrisPeech清洁,其他和六个重音ASR测试集降低了相对的12.7%,6.0%和7.7%。

Exploiting effective target modeling units is very important and has always been a concern in end-to-end automatic speech recognition (ASR). In this work, we propose a phonetic-assisted multi target units (PMU) modeling approach, to enhance the Conformer-Transducer ASR system in a progressive representation learning manner. Specifically, PMU first uses the pronunciation-assisted subword modeling (PASM) and byte pair encoding (BPE) to produce phonetic-induced and text-induced target units separately; Then, three new frameworks are investigated to enhance the acoustic encoder, including a basic PMU, a paraCTC and a pcaCTC, they integrate the PASM and BPE units at different levels for CTC and transducer multi-task training. Experiments on both LibriSpeech and accented ASR tasks show that, the proposed PMU significantly outperforms the conventional BPE, it reduces the WER of LibriSpeech clean, other, and six accented ASR testsets by relative 12.7%, 6.0% and 7.7%, respectively.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源