论文标题

Chewbaccann:灵活的223上衣/W BNN加速器

ChewBaccaNN: A Flexible 223 TOPS/W BNN Accelerator

论文作者

Andri, Renzo, Karunaratne, Geethan, Cavigelli, Lukas, Benini, Luca

论文摘要

二进制神经网络可实现智能IoT设备,因为它们会大大降低所需的内存足迹和计算复杂性,同时保持高网络性能和灵活性。本文介绍了Chewbaccann,这是一个0.7毫米$^2 $尺寸的二元卷积神经网络(CNN)加速器,该加速器在GlobalFoundries 22 NM技术中设计。通过利用有效的数据重复使用,数据缓冲,基于闩锁的存储器和电压缩放,在推断二进制CNN的二进制CNN中仅消耗1.1 MW的吞吐量,同时仅消耗1.1 MW,最高为7x7 kernels,从而导致峰值核心能效率为223 TOPS/W。 Chewbaccann的灵活性可以与其他加速器相比,运行更宽的二进制CNN范围,从而极大地提高了准确性能量的权衡,而不是由TOPS/W指标捕获的二进制CNN。实际上,它可以以86.8%的精度执行CIFAR-10推断,仅1.3 $μJ$,因此超出了准确性,同时将能源成本降低2.8倍,而即使是最有效,更大的模拟方法处理,同时在需要更高的CNN的灵活性时,可以在需要的情况下运行更大的cnns。它还运行了在1000级ILSVRC数据集中训练的二进制RESNET-18,并将能源效率提高了4.4倍,而不是具有类似灵活性的加速器。此外,它可以对经过8个Bases组NET训练的二进制RESNET-18进行推断,以获得67.5%的TOP-1精度,仅使用3.0 mJ/帧,仅准确地降低了Full Percision Resnet-18的精度。

Binary Neural Networks enable smart IoT devices, as they significantly reduce the required memory footprint and computational complexity while retaining a high network performance and flexibility. This paper presents ChewBaccaNN, a 0.7 mm$^2$ sized binary convolutional neural network (CNN) accelerator designed in GlobalFoundries 22 nm technology. By exploiting efficient data re-use, data buffering, latch-based memories, and voltage scaling, a throughput of 241 GOPS is achieved while consuming just 1.1 mW at 0.4V/154MHz during inference of binary CNNs with up to 7x7 kernels, leading to a peak core energy efficiency of 223 TOPS/W. ChewBaccaNN's flexibility allows to run a much wider range of binary CNNs than other accelerators, drastically improving the accuracy-energy trade-off beyond what can be captured by the TOPS/W metric. In fact, it can perform CIFAR-10 inference at 86.8% accuracy with merely 1.3 $μJ$, thus exceeding the accuracy while at the same time lowering the energy cost by 2.8x compared to even the most efficient and much larger analog processing-in-memory devices, while keeping the flexibility of running larger CNNs for higher accuracy when needed. It also runs a binary ResNet-18 trained on the 1000-class ILSVRC dataset and improves the energy efficiency by 4.4x over accelerators of similar flexibility. Furthermore, it can perform inference on a binarized ResNet-18 trained with 8-bases Group-Net to achieve a 67.5% Top-1 accuracy with only 3.0 mJ/frame -- at an accuracy drop of merely 1.8% from the full-precision ResNet-18.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源