论文标题
端到端的多外观关键字发现
End-to-End Multi-Look Keyword Spotting
论文作者
论文摘要
关键字发现(KWS)的性能,以错误的警报和错误的拒绝测量,在远处和嘈杂条件下大大降解。在本文中,我们提出了一种多外观的神经网络建模,以增强语音,同时倾听多个样本的外观方向。然后,将多外观增强与KWS共同训练,以形成端到端的KWS模型,该模型从多个外观方向集成了增强信号,并利用了注意机制,将模型的注意力转向可靠来源。我们证明,在我们的大型嘈杂和远场评估集中,提出的方法显着提高了针对基线KWS系统的KWS性能和最新的基于波束的多光束KWS系统。
The performance of keyword spotting (KWS), measured in false alarms and false rejects, degrades significantly under the far field and noisy conditions. In this paper, we propose a multi-look neural network modeling for speech enhancement which simultaneously steers to listen to multiple sampled look directions. The multi-look enhancement is then jointly trained with KWS to form an end-to-end KWS model which integrates the enhanced signals from multiple look directions and leverages an attention mechanism to dynamically tune the model's attention to the reliable sources. We demonstrate, on our large noisy and far-field evaluation sets, that the proposed approach significantly improves the KWS performance against the baseline KWS system and a recent beamformer based multi-beam KWS system.