探索体重的重要性和模型修剪中的黑森西亚偏见

论文标题

探索体重的重要性和模型修剪中的黑森西亚偏见

Exploring Weight Importance and Hessian Bias in Model Pruning

论文作者

Li, Mingchen, Sattar, Yahya, Thrampoulidis, Christos, Oymak, Samet

论文摘要

修剪模型是建立紧凑和计算高效的机器学习模型的重要过程。良好修剪算法的关键特征是它准确地量化了模型权重的相对重要性。尽管模型修剪具有丰富的历史，但即使对于涉及线性模型或浅网络网的相对简单的问题，我们仍然对修剪机制也没有完全掌握。在这项工作中，我们通过基于自然重要性的概念来提供对修剪的原则探索。对于线性模型，我们表明，这种重要性的概念是通过与著名的基于Hessian的修剪连接的协方差缩放来捕获的。然后，我们得出渐近公式，使我们能够精确比较不同修剪方法的性能。对于神经网络，我们证明了重要性可能与较大的幅度不一致，并且适当的初始化对于基于幅度的修剪至关重要。具体而言，我们确定了尽管变小，但重量变得越来越重要，这又导致基于幅度的修剪的灾难性失败。我们的结果还阐明了黑森州结构形式的隐式正则化在识别重要权重的催化作用，这决定了修剪性能。

Model pruning is an essential procedure for building compact and computationally-efficient machine learning models. A key feature of a good pruning algorithm is that it accurately quantifies the relative importance of the model weights. While model pruning has a rich history, we still don't have a full grasp of the pruning mechanics even for relatively simple problems involving linear models or shallow neural nets. In this work, we provide a principled exploration of pruning by building on a natural notion of importance. For linear models, we show that this notion of importance is captured by covariance scaling which connects to the well-known Hessian-based pruning. We then derive asymptotic formulas that allow us to precisely compare the performance of different pruning methods. For neural networks, we demonstrate that the importance can be at odds with larger magnitudes and proper initialization is critical for magnitude-based pruning. Specifically, we identify settings in which weights become more important despite becoming smaller, which in turn leads to a catastrophic failure of magnitude-based pruning. Our results also elucidate that implicit regularization in the form of Hessian structure has a catalytic role in identifying the important weights, which dictate the pruning performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题