与本地量化培训有关语音识别的4位构象异构体

论文标题

与本地量化培训有关语音识别的4位构象异构体

4-bit Conformer with Native Quantization Aware Training for Speech Recognition

论文作者

Ding, Shaojin, Meadowlark, Phoenix, He, Yanzhang, Lew, Lukasz, Agrawal, Shivani, Rybakov, Oleg

论文摘要

减少潜伏期和模型大小一直是实时自动语音识别（ASR）应用程序方案的重要研究问题。沿着这个方向，模型量化已成为压缩神经网络并降低计算成本的越来越流行的方法。大多数现有的实际ASR系统都采用训练后8位量化。为了在不引入额外的性能回归的情况下达到更高的压缩率，在这项研究中，我们建议开发具有本机量化培训的4位ASR模型，该模型利用本机整数操作有效地优化培训和推理。我们对基于最新的构象体模型进行了两个实验，以评估我们提出的量化技术。首先，我们探讨了不同精度对重量和激活量化对LibrisPeech数据集的影响，并获得了与Float32模型相比，获得了5.8倍大小的无损4位构象异构体模型。此后，我们首次研究并揭示了使用大规模数据集训练的实用ASR系统上4位量化的可行性，并产生了与Float32模型相比，具有5倍尺寸的混合4位和8位权重的无损构象ASR模型。

Reducing the latency and model size has always been a significant research problem for live Automatic Speech Recognition (ASR) application scenarios. Along this direction, model quantization has become an increasingly popular approach to compress neural networks and reduce computation cost. Most of the existing practical ASR systems apply post-training 8-bit quantization. To achieve a higher compression rate without introducing additional performance regression, in this study, we propose to develop 4-bit ASR models with native quantization aware training, which leverages native integer operations to effectively optimize both training and inference. We conducted two experiments on state-of-the-art Conformer-based ASR models to evaluate our proposed quantization technique. First, we explored the impact of different precisions for both weight and activation quantization on the LibriSpeech dataset, and obtained a lossless 4-bit Conformer model with 5.8x size reduction compared to the float32 model. Following this, we for the first time investigated and revealed the viability of 4-bit quantization on a practical ASR system that is trained with large-scale datasets, and produced a lossless Conformer ASR model with mixed 4-bit and 8-bit weights that has 5x size reduction compared to the float32 model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题