论文标题
动态噪音嵌入:噪音意识训练和语音增强的适应
Dynamic Noise Embedding: Noise Aware Training and Adaptation for Speech Enhancement
论文作者
论文摘要
准确地估算噪声信息对于语音应用中的噪声训练至关重要,包括语音增强(SE),这是我们本文的重点。为了估算仅噪声的帧,我们使用语音活动检测(VAD)来通过在语音后部应用最佳阈值来检测非语音框架。在这里,非语音帧可以被视为噪声信号中的仅噪声帧。这些估计的帧用于提取噪声嵌入,名为“动态噪声嵌入”(DNE),这对于捕获背景噪声的特征很有用。 DNE是由简单的神经网络提取的,可以共同训练具有DNE的SE模块以适应环境。实验是在TIMIT数据集上进行单通道denoising任务的,U-NET用作骨干SE模块。实验结果表明,即使噪声是非平稳的,在训练中看不见,DNE在SE模块中起着重要作用。此外,我们证明DNE可以灵活地应用于其他基于神经网络的SE模块。
Estimating noise information exactly is crucial for noise aware training in speech applications including speech enhancement (SE) which is our focus in this paper. To estimate noise-only frames, we employ voice activity detection (VAD) to detect non-speech frames by applying optimal threshold on speech posterior. Here, the non-speech frames can be regarded as noise-only frames in noisy signal. These estimated frames are used to extract noise embedding, named dynamic noise embedding (DNE), which is useful for an SE module to capture the characteristic of background noise. The DNE is extracted by a simple neural network, and the SE module with the DNE can be jointly trained to be adaptive to the environment. Experiments are conducted on TIMIT dataset for single-channel denoising task and U-Net is used as a backbone SE module. Experimental results show that the DNE plays an important role in the SE module by increasing the quality and the intelligibility of corrupted signal even if the noise is non-stationary and unseen in training. In addition, we demonstrate that the DNE can be flexibly applied to other neural network-based SE modules.