深入嵌入深度神经网络的损失景观分析

论文标题

深入嵌入深度神经网络的损失景观分析

Embedding Principle in Depth for the Loss Landscape Analysis of Deep Neural Networks

论文作者

Bai, Zhiwei, Luo, Tao, Xu, Zhi-Qin John, Zhang, Yaoyu

论文摘要

了解深度学习的理论研究非常重要。在这项工作中，我们发现了一个嵌入原则，即nn的损失格局“包含”浅NN的损失景观的所有关键点。我们发现的关键工具是在这项工作中提出的关键起重操作员，该操作员将网络的任何关键点映射到保留输出的同时，将网络的任何关键点映射到任何更深层网络的关键流形。该原则为DNN的许多广泛观察到的行为提供了新的见解。关于对深网的易于培训，我们表明可以将NN的局部最低限制为更深的NN的严格鞍点。关于批次归一化的加速度效应，我们证明了批处理归一化有助于避免通过抑制层线性化层的较浅NN的临界歧管。我们还证明，增加训练数据会缩小升降的临界流形，这可能导致训练加速，如实验中所证明的那样。总体而言，我们对深度嵌入原理的发现发现了深度学习损失格局的深度层次结构，这是进一步研究DNNS的深度作用的坚实基础。

Understanding the relation between deep and shallow neural networks is extremely important for the theoretical study of deep learning. In this work, we discover an embedding principle in depth that loss landscape of an NN "contains" all critical points of the loss landscapes for shallower NNs. The key tool for our discovery is the critical lifting operator proposed in this work that maps any critical point of a network to critical manifolds of any deeper network while preserving the outputs. This principle provides new insights to many widely observed behaviors of DNNs. Regarding the easy training of deep networks, we show that local minimum of an NN can be lifted to strict saddle points of a deeper NN. Regarding the acceleration effect of batch normalization, we demonstrate that batch normalization helps avoid the critical manifolds lifted from shallower NNs by suppressing layer linearization. We also prove that increasing training data shrinks the lifted critical manifolds, which can result in acceleration of training as demonstrated in experiments. Overall, our discovery of the embedding principle in depth uncovers the depth-wise hierarchical structure of deep learning loss landscape, which serves as a solid foundation for the further study about the role of depth for DNNs.

下载PDF全文

下载文献需遵守相关版权规定

论文标题