Yono：在微控制器上建模多个异质神经网络

论文标题

Yono：在微控制器上建模多个异质神经网络

YONO: Modeling Multiple Heterogeneous Neural Networks on Microcontrollers

论文作者

Kwon, Young D., Chauhan, Jagmohan, Mascolo, Cecilia

论文摘要

随着深度神经网络（DNN）的发展以及来自物联网（IoT）系统的大量传感器数据，研究界一直致力于减少DNN的计算和资源需求，以计算低资源的微控制器（MCUS）。但是，嵌入式深度学习中的大多数工作都集中在有效地解决一项任务，而物联网设备的多任务性质和应用程序需求系统可以同时处理各种传感器的输入。在本文中，我们提出了基于产品量化（PQ）方法的Yono，该方法压缩了多个异质模型，并实现了MCUS上的多任务学习中的内存模型执行和切换。我们首先采用PQ来学习存储不同模型权重的代码手册。此外，我们提出了一种新型的网络优化和启发式方法，以最大程度地提高压缩率并最大程度地减少准确性损失。然后，我们开发了YONO的在线组件，以在运行时在MCU上有效地执行并在不依赖外部存储设备的情况下切换多个任务。 Yono表现出色的性能，因为它可以压缩多种异质模型，而准确性可忽略不计或最高12.37 $ \ times $。此外，与外部存储访问相比，Yono的在线组件可以有效执行（每次操作16-159毫秒的延迟），并将模型加载/开关延迟和能源消耗降低93.3-94.5％和93.9-95.0％。有趣的是，Yono可以压缩各种训练数据集的架构，这些架构在Yono的离线代码书学习阶段未显示的数据集，以显示我们方法的普遍性。总而言之，Yono具有巨大的潜力，并打开了进一步的大门，以在极度资源受限的设备上启用多任务学习系统。

With the advancement of Deep Neural Networks (DNN) and large amounts of sensor data from Internet of Things (IoT) systems, the research community has worked to reduce the computational and resource demands of DNN to compute on low-resourced microcontrollers (MCUs). However, most of the current work in embedded deep learning focuses on solving a single task efficiently, while the multi-tasking nature and applications of IoT devices demand systems that can handle a diverse range of tasks (activity, voice, and context recognition) with input from a variety of sensors, simultaneously. In this paper, we propose YONO, a product quantization (PQ) based approach that compresses multiple heterogeneous models and enables in-memory model execution and switching for dissimilar multi-task learning on MCUs. We first adopt PQ to learn codebooks that store weights of different models. Also, we propose a novel network optimization and heuristics to maximize the compression rate and minimize the accuracy loss. Then, we develop an online component of YONO for efficient model execution and switching between multiple tasks on an MCU at run time without relying on an external storage device. YONO shows remarkable performance as it can compress multiple heterogeneous models with negligible or no loss of accuracy up to 12.37$\times$. Besides, YONO's online component enables an efficient execution (latency of 16-159 ms per operation) and reduces model loading/switching latency and energy consumption by 93.3-94.5% and 93.9-95.0%, respectively, compared to external storage access. Interestingly, YONO can compress various architectures trained with datasets that were not shown during YONO's offline codebook learning phase showing the generalizability of our method. To summarize, YONO shows great potential and opens further doors to enable multi-task learning systems on extremely resource-constrained devices.

下载PDF全文

下载文献需遵守相关版权规定

论文标题