流媒体关键字在移动设备上发现

论文标题

流媒体关键字在移动设备上发现

Streaming keyword spotting on mobile devices

论文作者

Rybakov, Oleg, Kononenko, Natasha, Subrahmanya, Niranjan, Visontai, Mirko, Laurenzo, Stella

论文摘要

在这项工作中，我们探讨了手机流式传输和非流传输模式中关键字斑点（KWS）模型的延迟和准确性。 NN模型转换从非流程模式（模型接收整个输入序列，然后返回分类结果）到流模式（模型接收输入序列的一部分并增量分类）可能需要手动模型重写。我们通过设计一个基于TensorFlow/Keras的库来解决此问题，该库允许将非流程模型自动转换为流媒体模型。在此库中，我们在手机上的流和非流传输模式中基准了多个KWS模型，并在延迟和准确性之间表现出不同的权衡。我们还探索具有多头注意力的新颖KWS模型，这将对ART的分类错误在Google语音命令数据集V2上降低了10％。带有所有实验的流库是开源的。

In this work we explore the latency and accuracy of keyword spotting (KWS) models in streaming and non-streaming modes on mobile phones. NN model conversion from non-streaming mode (model receives the whole input sequence and then returns the classification result) to streaming mode (model receives portion of the input sequence and classifies it incrementally) may require manual model rewriting. We address this by designing a Tensorflow/Keras based library which allows automatic conversion of non-streaming models to streaming ones with minimum effort. With this library we benchmark multiple KWS models in both streaming and non-streaming modes on mobile phones and demonstrate different tradeoffs between latency and accuracy. We also explore novel KWS models with multi-head attention which reduce the classification error over the state-of-art by 10% on Google speech commands data sets V2. The streaming library with all experiments is open-sourced.

下载PDF全文

下载文献需遵守相关版权规定

论文标题