Chewbaccann：灵活的223上衣/W BNN加速器

论文标题

Chewbaccann：灵活的223上衣/W BNN加速器

ChewBaccaNN: A Flexible 223 TOPS/W BNN Accelerator

论文作者

Andri, Renzo, Karunaratne, Geethan, Cavigelli, Lukas, Benini, Luca

论文摘要

二进制神经网络可实现智能IoT设备，因为它们会大大降低所需的内存足迹和计算复杂性，同时保持高网络性能和灵活性。本文介绍了Chewbaccann，这是一个0.7毫米$^2 $尺寸的二元卷积神经网络（CNN）加速器，该加速器在GlobalFoundries 22 NM技术中设计。通过利用有效的数据重复使用，数据缓冲，基于闩锁的存储器和电压缩放，在推断二进制CNN的二进制CNN中仅消耗1.1 MW的吞吐量，同时仅消耗1.1 MW，最高为7x7 kernels，从而导致峰值核心能效率为223 TOPS/W。 Chewbaccann的灵活性可以与其他加速器相比，运行更宽的二进制CNN范围，从而极大地提高了准确性能量的权衡，而不是由TOPS/W指标捕获的二进制CNN。实际上，它可以以86.8％的精度执行CIFAR-10推断，仅1.3 $μJ$，因此超出了准确性，同时将能源成本降低2.8倍，而即使是最有效，更大的模拟方法处理，同时在需要更高的CNN的灵活性时，可以在需要的情况下运行更大的cnns。它还运行了在1000级ILSVRC数据集中训练的二进制RESNET-18，并将能源效率提高了4.4倍，而不是具有类似灵活性的加速器。此外，它可以对经过8个Bases组NET训练的二进制RESNET-18进行推断，以获得67.5％的TOP-1精度，仅使用3.0 mJ/帧，仅准确地降低了Full Percision Resnet-18的精度。

Binary Neural Networks enable smart IoT devices, as they significantly reduce the required memory footprint and computational complexity while retaining a high network performance and flexibility. This paper presents ChewBaccaNN, a 0.7 mm$^2$ sized binary convolutional neural network (CNN) accelerator designed in GlobalFoundries 22 nm technology. By exploiting efficient data re-use, data buffering, latch-based memories, and voltage scaling, a throughput of 241 GOPS is achieved while consuming just 1.1 mW at 0.4V/154MHz during inference of binary CNNs with up to 7x7 kernels, leading to a peak core energy efficiency of 223 TOPS/W. ChewBaccaNN's flexibility allows to run a much wider range of binary CNNs than other accelerators, drastically improving the accuracy-energy trade-off beyond what can be captured by the TOPS/W metric. In fact, it can perform CIFAR-10 inference at 86.8% accuracy with merely 1.3 $μJ$, thus exceeding the accuracy while at the same time lowering the energy cost by 2.8x compared to even the most efficient and much larger analog processing-in-memory devices, while keeping the flexibility of running larger CNNs for higher accuracy when needed. It also runs a binary ResNet-18 trained on the 1000-class ILSVRC dataset and improves the energy efficiency by 4.4x over accelerators of similar flexibility. Furthermore, it can perform inference on a binarized ResNet-18 trained with 8-bases Group-Net to achieve a 67.5% Top-1 accuracy with only 3.0 mJ/frame -- at an accuracy drop of merely 1.8% from the full-precision ResNet-18.

下载PDF全文

下载文献需遵守相关版权规定

论文标题