前景修剪：使用元梯度在初始化时找到可训练的权重

论文标题

前景修剪：使用元梯度在初始化时找到可训练的权重

Prospect Pruning: Finding Trainable Weights at Initialization using Meta-Gradients

论文作者

Alizadeh, Milad, Tailor, Shyam A., Zintgraf, Luisa M, van Amersfoort, Joost, Farquhar, Sebastian, Lane, Nicholas Donald, Gal, Yarin

论文摘要

在初始化时修剪神经网络将使我们能够找到稀疏的模型，以保留原始网络的准确性，同时消耗较少的计算资源来进行培训和推理。但是，当前方法不足以实现这种优化并导致模型性能的大量降解。在本文中，我们确定了当前方法的制定中的基本限制，即它们的显着性标准在培训开始时要查看一步，而无需考虑网络的训练性。虽然迭代和逐渐修剪可以提高修剪性能，但明确考虑训练阶段将立即进行修剪，但迄今为止，显着性标准的计算仍未进行。为了克服现有方法的短视性，我们提出了前景修剪（PROSPR），该方法在优化的前几个步骤中使用元梯度来确定要修剪哪些权重。 PROSPR结合了对修剪对损失的高阶影响和优化轨迹的估计，以识别可训练的子网络。与现有的原始化方法相比，我们的方法在各种视觉分类任务上实现了最新的修剪性能，具有更少的数据和单个镜头。

Pruning neural networks at initialization would enable us to find sparse models that retain the accuracy of the original network while consuming fewer computational resources for training and inference. However, current methods are insufficient to enable this optimization and lead to a large degradation in model performance. In this paper, we identify a fundamental limitation in the formulation of current methods, namely that their saliency criteria look at a single step at the start of training without taking into account the trainability of the network. While pruning iteratively and gradually has been shown to improve pruning performance, explicit consideration of the training stage that will immediately follow pruning has so far been absent from the computation of the saliency criterion. To overcome the short-sightedness of existing methods, we propose Prospect Pruning (ProsPr), which uses meta-gradients through the first few steps of optimization to determine which weights to prune. ProsPr combines an estimate of the higher-order effects of pruning on the loss and the optimization trajectory to identify the trainable sub-network. Our method achieves state-of-the-art pruning performance on a variety of vision classification tasks, with less data and in a single shot compared to existing pruning-at-initialization methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题