论文标题
定量和修剪神经网络压缩和正则化
Quantisation and Pruning for Neural Network Compression and Regularisation
论文作者
论文摘要
深度神经网络通常在计算上太昂贵了,无法实时运行在消费级硬件和低功率设备上。在本文中,我们研究通过网络修剪和定量来减少神经网络的计算和记忆要求。与最近的紧凑型体系结构相比,我们检查了它们在Alexnet等大型网络上的功效:Shufflenet和Mobilenet。我们的结果表明,修剪和定量将这些网络压缩到其原始大小的一半,并提高了效率,尤其是在7倍加速的Mobilenet上。我们还证明,除了减少网络中的参数数量外,修剪还可以帮助校正过度拟合。
Deep neural networks are typically too computationally expensive to run in real-time on consumer-grade hardware and low-powered devices. In this paper, we investigate reducing the computational and memory requirements of neural networks through network pruning and quantisation. We examine their efficacy on large networks like AlexNet compared to recent compact architectures: ShuffleNet and MobileNet. Our results show that pruning and quantisation compresses these networks to less than half their original size and improves their efficiency, particularly on MobileNet with a 7x speedup. We also demonstrate that pruning, in addition to reducing the number of parameters in a network, can aid in the correction of overfitting.