论文标题
目标归一化和动量对垂死的影响的影响
The effect of Target Normalization and Momentum on Dying ReLU
论文作者
论文摘要
用动量,归一化数据值和使用整流线性单元(RELUS)优化参数是神经网络(NN)回归中的流行选择。尽管Relus很受欢迎,但它们可以崩溃到恒定的功能和“死亡”,从而有效地从模型中删除了它们的贡献。尽管已知一些缓解措施,但目前对优化过程中依赖依赖的根本原因却鲜为人知。在本文中,我们考虑了目标归一化和动量对垂死的恢复的影响。我们从经验上发现,单位差异目标是有动力的,并且当目标方差接近零时,依赖更容易死亡。为了进一步研究此问题,我们分析了离散的线性自主系统,并从理论上展示了与单个relu的模型之间的关系,以及公共属性如何导致垂死的依赖。我们还分析了单重卢模型的梯度,以识别与垂死的relu相对应的鞍点和区域,以及当使用动量时参数如何演变为这些区域。最后,我们从经验上表明,对于包括剩余网络在内的更深层次的模型,这个问题持续存在并且被加重。
Optimizing parameters with momentum, normalizing data values, and using rectified linear units (ReLUs) are popular choices in neural network (NN) regression. Although ReLUs are popular, they can collapse to a constant function and "die", effectively removing their contribution from the model. While some mitigations are known, the underlying reasons of ReLUs dying during optimization are currently poorly understood. In this paper, we consider the effects of target normalization and momentum on dying ReLUs. We find empirically that unit variance targets are well motivated and that ReLUs die more easily, when target variance approaches zero. To further investigate this matter, we analyze a discrete-time linear autonomous system, and show theoretically how this relates to a model with a single ReLU and how common properties can result in dying ReLU. We also analyze the gradients of a single-ReLU model to identify saddle points and regions corresponding to dying ReLU and how parameters evolve into these regions when momentum is used. Finally, we show empirically that this problem persist, and is aggravated, for deeper models including residual networks.