论文标题

使用AI模型效率工具包(AIMET)的神经网络量化

Neural Network Quantization with AI Model Efficiency Toolkit (AIMET)

论文作者

Siddegowda, Sangeetha, Fournarakis, Marios, Nagel, Markus, Blankevoort, Tijmen, Patel, Chirag, Khobare, Abhijit

论文摘要

尽管神经网络在许多机器学习应用程序中都提高了前沿,但它们通常以高计算成本呈现。减少神经网络推理的功能和潜伏期对于将现代网络集成到具有严格的功率和计算要求的边缘设备中至关重要。神经网络量化是实现这些节省的最有效方法之一,但是它引起的额外噪声可能导致准确性降解。在这份白皮书中,我们使用AI模型效率工具包(AIMET)介绍了神经网络量化的概述。 AIMET是一个最先进的量化和压缩算法的库,旨在减轻模型优化所需的努力,从而将更广泛的AI生态系统推向低潜伏期和节能推论。 AIMET为用户提供了模拟和优化Pytorch和Tensorflow模型的能力。特别是用于量化的AIMET包括各种训练后量化(PTQ,参见第4章)和量化感知训练(QAT,参见第5章)技术,可确保8位定点推断的浮点精度接近浮点精度。我们通过涵盖PTQ和QAT工作流程,代码示例和实用技巧,为通过AIMET提供实用指南,以使用户能够使用AIMET有效地对模型进行有效量化模型,并获得低位整数推断的好处。

While neural networks have advanced the frontiers in many machine learning applications, they often come at a high computational cost. Reducing the power and latency of neural network inference is vital to integrating modern networks into edge devices with strict power and compute requirements. Neural network quantization is one of the most effective ways of achieving these savings, but the additional noise it induces can lead to accuracy degradation. In this white paper, we present an overview of neural network quantization using AI Model Efficiency Toolkit (AIMET). AIMET is a library of state-of-the-art quantization and compression algorithms designed to ease the effort required for model optimization and thus drive the broader AI ecosystem towards low latency and energy-efficient inference. AIMET provides users with the ability to simulate as well as optimize PyTorch and TensorFlow models. Specifically for quantization, AIMET includes various post-training quantization (PTQ, cf. chapter 4) and quantization-aware training (QAT, cf. chapter 5) techniques that guarantee near floating-point accuracy for 8-bit fixed-point inference. We provide a practical guide to quantization via AIMET by covering PTQ and QAT workflows, code examples and practical tips that enable users to efficiently and effectively quantize models using AIMET and reap the benefits of low-bit integer inference.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源