使用经典的自适应滤波器理论将批准归一化对CNN训练速度和稳定性的影响分开

论文标题

使用经典的自适应滤波器理论将批准归一化对CNN训练速度和稳定性的影响分开

Separating the Effects of Batch Normalization on CNN Training Speed and Stability Using Classical Adaptive Filter Theory

论文作者

Chai, Elaina, Pilanci, Mert, Murmann, Boris

论文摘要

批次归一化（BatchNorm）通常用于卷积神经网络（CNN）以提高训练速度和稳定性。但是，关于这种技术为什么有效的共识仍然有限。本文使用传统自适应滤波器域中的概念来洞悉BatchNorm的动态和内部工作。首先，我们表明卷积重量更新具有自然模式，其稳定性和收敛速度与输入自相关矩阵的特征值相关，这些矩阵通过BatchNorm通过卷积层的通道结构来控制。此外，我们的实验表明速度和稳定性益处是不同的影响。以较低的学习率，Batchnorm对最小的特征值的扩增提高了收敛速度，而以高学习率，Batchnorm对最大的特征值的抑制可以确保稳定性。最后，我们证明，在最需要归一化的第一个训练步骤中，BatchNorm满足与归一化最小平方正方形（NLMS）相同的优化，而在随后的步骤中继续近似此条件。本文提供的分析为使用自适应滤波器理论进一步了解现代神经网络结构的运作奠定了基础。

Batch Normalization (BatchNorm) is commonly used in Convolutional Neural Networks (CNNs) to improve training speed and stability. However, there is still limited consensus on why this technique is effective. This paper uses concepts from the traditional adaptive filter domain to provide insight into the dynamics and inner workings of BatchNorm. First, we show that the convolution weight updates have natural modes whose stability and convergence speed are tied to the eigenvalues of the input autocorrelation matrices, which are controlled by BatchNorm through the convolution layers' channel-wise structure. Furthermore, our experiments demonstrate that the speed and stability benefits are distinct effects. At low learning rates, it is BatchNorm's amplification of the smallest eigenvalues that improves convergence speed, while at high learning rates, it is BatchNorm's suppression of the largest eigenvalues that ensures stability. Lastly, we prove that in the first training step, when normalization is needed most, BatchNorm satisfies the same optimization as Normalized Least Mean Square (NLMS), while it continues to approximate this condition in subsequent steps. The analyses provided in this paper lay the groundwork for gaining further insight into the operation of modern neural network structures using adaptive filter theory.

下载PDF全文

下载文献需遵守相关版权规定

论文标题