对低资源强调普通话识别深度神经网络声学建模方法的调查

论文标题

对低资源强调普通话识别深度神经网络声学建模方法的调查

Investigation of Deep Neural Network Acoustic Modelling Approaches for Low Resource Accented Mandarin Speech Recognition

论文作者

Xie, Xurong, Sui, Xiang, Liu, Xunying, Wang, Lan

论文摘要

众所周知，普通话的中文受到了丰富的区域口音的强烈影响，而每种口音的普通话演讲都是很低的资源。因此，普通话语音识别的一项重要任务是适当地对口音施加的声学变化进行适当建模。在本文中，进行了对基于深层神经网络（DNN）基于的声学网络（DNN）的声音信息的隐式和明确使用信息的研究。同时，在本文中结合了多元训练，包括多风格训练，多价训练，多重决策树状态绑定，DNN串联和多级自适应网络（MLAN）串联串联模型（HMM）模型（HMM）模型。在低资源的强调普通话语音识别任务中，提出了改进的MLAN TANDEM HMM系统，明确地利用了重音信息，并明显超过了基线重音独立的DNN串联系统，在序列级别的层次级别训练和适应性层次训练中，并适用于0.8％-1.5％-1.5％-1.5％-9％-9％ - 9％-9％。

The Mandarin Chinese language is known to be strongly influenced by a rich set of regional accents, while Mandarin speech with each accent is quite low resource. Hence, an important task in Mandarin speech recognition is to appropriately model the acoustic variabilities imposed by accents. In this paper, an investigation of implicit and explicit use of accent information on a range of deep neural network (DNN) based acoustic modelling techniques is conducted. Meanwhile, approaches of multi-accent modelling including multi-style training, multi-accent decision tree state tying, DNN tandem and multi-level adaptive network (MLAN) tandem hidden Markov model (HMM) modelling are combined and compared in this paper. On a low resource accented Mandarin speech recognition task consisting of four regional accents, an improved MLAN tandem HMM systems explicitly leveraging the accent information was proposed and significantly outperformed the baseline accent independent DNN tandem systems by 0.8%-1.5% absolute (6%-9% relative) in character error rate after sequence level discriminative training and adaptation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题