基于深度学习的时间信封的基于强大的语音识别

论文标题

基于深度学习的时间信封的基于强大的语音识别

Deep Learning Based Dereverberation of Temporal Envelopesfor Robust Speech Recognition

论文作者

Purushothaman, Anurenjan, Sreeram, Anirudh, Kumar, Rohit, Ganapathy, Sriram

论文摘要

在混响条件下自动语音识别是一项艰巨的任务，因为在暂时涂抹混响的语音的长期信封。在本文中，我们提出了一种神经模型，以增强子频段的时间信封，以进行言语的覆盖。时间信封是使用频域线性预测（FDLP）的自回旋建模框架得出的。本文提出的神经增强模型进行了基于临时的时间信封的基于增益的增强，它由一系列卷积和经常性的神经网络层组成。增强的子带信封用于生成自动语音识别（ASR）的功能。 ASR实验是在Reverb挑战数据集以及Chime-3数据集上执行的。在这些实验中，提出的神经增强方法可对具有光束音频的基线ASR系统进行显着改进（开发集的平均相对相对改善为21％，而Reverb挑战数据集的单词错误率中设定的评估设置约为11％）。

Automatic speech recognition in reverberant conditions is a challenging task as the long-term envelopes of the reverberant speech are temporally smeared. In this paper, we propose a neural model for enhancement of sub-band temporal envelopes for dereverberation of speech. The temporal envelopes are derived using the autoregressive modeling framework of frequency domain linear prediction (FDLP). The neural enhancement model proposed in this paper performs an envelop gain based enhancement of temporal envelopes and it consists of a series of convolutional and recurrent neural network layers. The enhanced sub-band envelopes are used to generate features for automatic speech recognition (ASR). The ASR experiments are performed on the REVERB challenge dataset as well as the CHiME-3 dataset. In these experiments, the proposed neural enhancement approach provides significant improvements over a baseline ASR system with beamformed audio (average relative improvements of 21% on the development set and about 11% on the evaluation set in word error rates for REVERB challenge dataset).

下载PDF全文

下载文献需遵守相关版权规定

论文标题