Freezenet：通过降低存储成本的全部性能

论文标题

Freezenet：通过降低存储成本的全部性能

FreezeNet: Full Performance by Reduced Storage Costs

论文作者

Wimmer, Paul, Mehnert, Jens, Condurache, Alexandru

论文摘要

修剪通过将参数设置为零来生成稀疏网络。在这项工作中，我们改进了在训练之前应用的一声修剪方法，而无需添加任何额外的存储成本，同时保留了稀疏的梯度计算。修剪的主要区别在于，我们不会稀疏网络的权重，而只是学习一些关键参数，而是将其他参数保持在其随机初始化值中。该机制称为冻结参数。这些冷冻重量可以用单个32位随机种子数有效地存储。要冷冻的参数通过在训练开始之前使用单个福音和向后通过来确定。我们称引入的方法冻结。在我们的实验中，我们表明冷冻植物取得了良好的效果，尤其是对于极端的冷冻率。冻结重量可以保留整个网络中的梯度流，因此，与修剪的同类产品相比，冰冻的训练训练更好，容量增加。关于分类任务MNIST和CIFAR-10/100，我们在训练之前使用的最佳报告的单发修剪方法优于折磨。在MNIST上，Freezenet在基线LENET-5-CAFFE架构中的性能达到99.2％，同时将训练有素和存储的参数的数量压缩为x 157倍。

Pruning generates sparse networks by setting parameters to zero. In this work we improve one-shot pruning methods, applied before training, without adding any additional storage costs while preserving the sparse gradient computations. The main difference to pruning is that we do not sparsify the network's weights but learn just a few key parameters and keep the other ones fixed at their random initialized value. This mechanism is called freezing the parameters. Those frozen weights can be stored efficiently with a single 32bit random seed number. The parameters to be frozen are determined one-shot by a single for- and backward pass applied before training starts. We call the introduced method FreezeNet. In our experiments we show that FreezeNets achieve good results, especially for extreme freezing rates. Freezing weights preserves the gradient flow throughout the network and consequently, FreezeNets train better and have an increased capacity compared to their pruned counterparts. On the classification tasks MNIST and CIFAR-10/100 we outperform SNIP, in this setting the best reported one-shot pruning method, applied before training. On MNIST, FreezeNet achieves 99.2% performance of the baseline LeNet-5-Caffe architecture, while compressing the number of trained and stored parameters by a factor of x 157.

下载PDF全文

下载文献需遵守相关版权规定

论文标题