CONLOCNN：利用相关性和不均匀量化为节能的低精度深度卷积神经网络

论文标题

CONLOCNN：利用相关性和不均匀量化为节能的低精度深度卷积神经网络

CoNLoCNN: Exploiting Correlation and Non-Uniform Quantization for Energy-Efficient Low-precision Deep Convolutional Neural Networks

论文作者

Hanif, Muhammad Abdullah, Sarda, Giuseppe Maria, Marchisio, Alberto, Masera, Guido, Martina, Maurizio, Shafique, Muhammad

论文摘要

在当今智能网络物理系统的时代，由于它们在复杂的现实世界应用中的最新性能，深度神经网络（DNN）已无处不在。这些网络的高计算复杂性转化为能源消耗的增加，是在资源约束系统中部署大型DNN的最大障碍。通过训练后量化实现的定点（FP）实现通常用于减少这些网络的能耗。但是，FP中的均匀量化间隔将数据结构的位宽度限制为大值，因为需要用足够的分辨率来表示大多数数字并避免较高的量化误差。在本文中，我们利用了关键见解，即（在大多数情况下）DNN的重量和激活主要集中在零附近，并且只有少数几个具有较大的幅度。我们提出了Conlocnn，该框架是通过利用来实现节能低精度深度卷积神经网络推断的框架：（1）重量的不均匀量化，以简化复杂的乘法操作的简化；（2）激活值之间的相关性，可以以低成本的量化误差进行部分补偿，而无需任何运行时开销。为了显着从不均匀的量化中受益，我们还提出了一种新型的数据表示格式，该格式编码了低精度二进制签名数字，以压缩重量的位，同时确保直接使用编码的权重来使用新型的多重和售价（MAC）单位设计来处理。

In today's era of smart cyber-physical systems, Deep Neural Networks (DNNs) have become ubiquitous due to their state-of-the-art performance in complex real-world applications. The high computational complexity of these networks, which translates to increased energy consumption, is the foremost obstacle towards deploying large DNNs in resource-constrained systems. Fixed-Point (FP) implementations achieved through post-training quantization are commonly used to curtail the energy consumption of these networks. However, the uniform quantization intervals in FP restrict the bit-width of data structures to large values due to the need to represent most of the numbers with sufficient resolution and avoid high quantization errors. In this paper, we leverage the key insight that (in most of the scenarios) DNN weights and activations are mostly concentrated near zero and only a few of them have large magnitudes. We propose CoNLoCNN, a framework to enable energy-efficient low-precision deep convolutional neural network inference by exploiting: (1) non-uniform quantization of weights enabling simplification of complex multiplication operations; and (2) correlation between activation values enabling partial compensation of quantization errors at low cost without any run-time overheads. To significantly benefit from non-uniform quantization, we also propose a novel data representation format, Encoded Low-Precision Binary Signed Digit, to compress the bit-width of weights while ensuring direct use of the encoded weight for processing using a novel multiply-and-accumulate (MAC) unit design.

下载PDF全文

下载文献需遵守相关版权规定

论文标题