论文标题
一项关于远场关键字发现更现实的房间模拟的研究
A study on more realistic room simulation for far-field keyword spotting
论文作者
论文摘要
我们研究了更逼真的房间模拟对训练远场关键字发现系统的影响,而无需对内域数据进行微调。为此,我们研究了将以下因素纳入房间脉冲响应(RIR)生成的影响:真实材料的空气吸收,表面和频率依赖的系数以及随机射线示踪。通过消融研究,与一组测量的RIR相比,使用唤醒单词任务来衡量这些因素的影响。在干净且嘈杂的远场条件下的一系列重新录制中,我们证明了高达$ 35.8 \%$ $的相对改进(单个吸收系数)图像源方法。源代码可在PyroomAcoustics软件包中提供,使其他人可以将这些技术纳入其工作中。
We investigate the impact of more realistic room simulation for training far-field keyword spotting systems without fine-tuning on in-domain data. To this end, we study the impact of incorporating the following factors in the room impulse response (RIR) generation: air absorption, surface- and frequency-dependent coefficients of real materials, and stochastic ray tracing. Through an ablation study, a wake word task is used to measure the impact of these factors in comparison with a ground-truth set of measured RIRs. On a hold-out set of re-recordings under clean and noisy far-field conditions, we demonstrate up to $35.8\%$ relative improvement over the commonly-used (single absorption coefficient) image source method. Source code is made available in the Pyroomacoustics package, allowing others to incorporate these techniques in their work.