论文标题
流媒体关键字在移动设备上发现
Streaming keyword spotting on mobile devices
论文作者
论文摘要
在这项工作中,我们探讨了手机流式传输和非流传输模式中关键字斑点(KWS)模型的延迟和准确性。 NN模型转换从非流程模式(模型接收整个输入序列,然后返回分类结果)到流模式(模型接收输入序列的一部分并增量分类)可能需要手动模型重写。我们通过设计一个基于TensorFlow/Keras的库来解决此问题,该库允许将非流程模型自动转换为流媒体模型。在此库中,我们在手机上的流和非流传输模式中基准了多个KWS模型,并在延迟和准确性之间表现出不同的权衡。我们还探索具有多头注意力的新颖KWS模型,这将对ART的分类错误在Google语音命令数据集V2上降低了10%。带有所有实验的流库是开源的。
In this work we explore the latency and accuracy of keyword spotting (KWS) models in streaming and non-streaming modes on mobile phones. NN model conversion from non-streaming mode (model receives the whole input sequence and then returns the classification result) to streaming mode (model receives portion of the input sequence and classifies it incrementally) may require manual model rewriting. We address this by designing a Tensorflow/Keras based library which allows automatic conversion of non-streaming models to streaming ones with minimum effort. With this library we benchmark multiple KWS models in both streaming and non-streaming modes on mobile phones and demonstrate different tradeoffs between latency and accuracy. We also explore novel KWS models with multi-head attention which reduce the classification error over the state-of-art by 10% on Google speech commands data sets V2. The streaming library with all experiments is open-sourced.