残留网络的动力等距

论文标题

残留网络的动力等距

Dynamical Isometry for Residual Networks

论文作者

Gadhikar, Advait, Burkholz, Rebekka

论文摘要

神经网络的训练成功，训练速度和概括能力至关重要的是随机参数初始化的选择。已经显示出多个架构的初始动力学等轴测图特别有利。然而，已知的残留块初始化方案错过了该特性，并且遭受了不同输入的可分离性，以增加深度和不稳定性，而无需分批归一化或缺乏特征多样性。我们提出了一个随机初始化方案烩饭，该方案即使在有限的深度和宽度方面，它也可以为具有RELU激活功能的残留网络实现完美的动力学等轴测图。与其他方案不同，它可以平衡残留物和跳过分支的贡献，这些方案最初偏向跳过连接。在实验中，我们证明，在大多数情况下，我们的方法的表现优于提议使批准化过时的初始化方案，包括固定和跳过，并促进稳定的培训。同样，与批归一化结合，我们发现烩饭通常可以取得总体最佳结果。

The training success, training speed and generalization ability of neural networks rely crucially on the choice of random parameter initialization. It has been shown for multiple architectures that initial dynamical isometry is particularly advantageous. Known initialization schemes for residual blocks, however, miss this property and suffer from degrading separability of different inputs for increasing depth and instability without Batch Normalization or lack feature diversity. We propose a random initialization scheme, RISOTTO, that achieves perfect dynamical isometry for residual networks with ReLU activation functions even for finite depth and width. It balances the contributions of the residual and skip branches unlike other schemes, which initially bias towards the skip connections. In experiments, we demonstrate that in most cases our approach outperforms initialization schemes proposed to make Batch Normalization obsolete, including Fixup and SkipInit, and facilitates stable training. Also in combination with Batch Normalization, we find that RISOTTO often achieves the overall best result.

下载PDF全文

下载文献需遵守相关版权规定

论文标题