机器耳朵的多通道语音

论文标题

机器耳朵的多通道语音

Multi-Channel Speech Denoising for Machine Ears

论文作者

Han, Cong, Kaya, E. Merve, Hoefer, Kyle, Slaney, Malcolm, Carlile, Simon

论文摘要

这项工作描述了机器耳朵的语音denoising系统，旨在提高语音清晰度和嘈杂环境中的整体聆听体验。我们使用将两对麦克风阵列放置在两个耳朵中的每个耳朵上，然后混合声音记录以模拟不良的声学场景，记录了大约100小时的音频数据，并带有混响和中等环境噪声。然后，我们在录音的混合物上培训了一个多渠道语音剥夺网络（MCSDN）。为了改善培训，我们采用了一种无监督的方法，即复杂的角度中央高斯混合模型（CACGMM），从嘈杂的录音中获取清洁的语音，以作为学习目标。我们在推理阶段提出了一个MCSDN-Bemforming-MCSDN框架。主观评估的结果表明，CACGMM改善了训练数据，从而导致降低噪音和用户偏好，整个系统都可以改善嘈杂情况下的清晰度和聆听体验。

This work describes a speech denoising system for machine ears that aims to improve speech intelligibility and the overall listening experience in noisy environments. We recorded approximately 100 hours of audio data with reverberation and moderate environmental noise using a pair of microphone arrays placed around each of the two ears and then mixed sound recordings to simulate adverse acoustic scenes. Then, we trained a multi-channel speech denoising network (MCSDN) on the mixture of recordings. To improve the training, we employ an unsupervised method, complex angular central Gaussian mixture model (cACGMM), to acquire cleaner speech from noisy recordings to serve as the learning target. We propose a MCSDN-Beamforming-MCSDN framework in the inference stage. The results of the subjective evaluation show that the cACGMM improves the training data, resulting in better noise reduction and user preference, and the entire system improves the intelligibility and listening experience in noisy situations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题