订购混乱：边缘设备不规则有线神经网络的记忆意识安排

论文标题

订购混乱：边缘设备不规则有线神经网络的记忆意识安排

Ordering Chaos: Memory-Aware Scheduling of Irregularly Wired Neural Networks for Edge Devices

论文作者

Ahn, Byung Hoon, Lee, Jinwon, Lin, Jamie Menjay, Cheng, Hsin-Pai, Hou, Jilei, Esmaeilzadeh, Hadi

论文摘要

最近的进步表明，来自神经体系结构搜索（NAS）和随机接线的不规则有线神经网络不仅可以自动化深神经网络的设计，而且可以发出效果超过先前的手动设计的模型。在硬资源约束（内存，Mac，……）下设计神经体系结构时，这些设计特别有效，该设计突出了此类设计神经网络的重要性。但是，这样的举动会在先前简化的执行模式中产生并发症。实际上，主要挑战之一是神经网络中此类节点的顺序显着影响中间激活的记忆足迹。当前的编译器没有安排激活记忆足迹，即与最佳相比，它显着增加了其峰值，这使其不适用于边缘设备。为了解决这个常规问题，我们提出了一个被称为Serenity的内存感知编译器，该编译器利用动态编程来找到一个序列，以找到具有最佳内存足迹的时间表。我们的解决方案还包含图形重写技术，该技术允许进一步的减少最佳。因此，Serenity实现了最佳的峰值存储器，并且图形重写技术进一步改善了这一方法，从而通过基于动态编程的调度程序和1.86倍提高了1.68倍的改善，而图形重写为1.86 x，而张tensorflow Lite则不到一分钟的开销。

Recent advances demonstrate that irregularly wired neural networks from Neural Architecture Search (NAS) and Random Wiring can not only automate the design of deep neural networks but also emit models that outperform previous manual designs. These designs are especially effective while designing neural architectures under hard resource constraints (memory, MACs, . . . ) which highlights the importance of this class of designing neural networks. However, such a move creates complication in the previously streamlined pattern of execution. In fact one of the main challenges is that the order of such nodes in the neural network significantly effects the memory footprint of the intermediate activations. Current compilers do not schedule with regard to activation memory footprint that it significantly increases its peak compared to the optimum, rendering it not applicable for edge devices. To address this standing issue, we present a memory-aware compiler, dubbed SERENITY, that utilizes dynamic programming to find a sequence that finds a schedule with optimal memory footprint. Our solution also comprises of graph rewriting technique that allows further reduction beyond the optimum. As such, SERENITY achieves optimal peak memory, and the graph rewriting technique further improves this resulting in 1.68x improvement with dynamic programming-based scheduler and 1.86x with graph rewriting, against TensorFlow Lite with less than one minute overhead.

下载PDF全文

下载文献需遵守相关版权规定

论文标题