剖析Hessian：了解神经网络中海森的共同结构

论文标题

剖析Hessian：了解神经网络中海森的共同结构

Dissecting Hessian: Understanding Common Structure of Hessian in Neural Networks

论文作者

Wu, Yikai, Zhu, Xingyu, Wu, Chenwei, Wang, Annie, Ge, Rong

论文摘要

Hessian捕获了深神经网络损失格局的重要特性。先前的工作已经观察到神经网络黑森的低级结构。在本文中，我们提出了一个脱钩的猜想，将网络的层hessians分解为两个较小矩阵的kronecker产品。我们可以分析这些较小的矩阵的属性，并证明顶部特征空间随机2层网络的结构。脱钩猜想还有其他几个有趣的含义 - 不同模型的顶部特征空间具有令人惊讶的高重叠，并且当将其重塑成与相应的重量矩阵相同的形状时，顶部特征向量形成低级矩阵。所有这些都可以通过经验来验证更深的网络。最后，我们使用层的Hessian的结构来为神经网络获得更好的显式概括界限。

Hessian captures important properties of the deep neural network loss landscape. Previous works have observed low rank structure in the Hessians of neural networks. In this paper, we propose a decoupling conjecture that decomposes the layer-wise Hessians of a network as the Kronecker product of two smaller matrices. We can analyze the properties of these smaller matrices and prove the structure of top eigenspace random 2-layer networks. The decoupling conjecture has several other interesting implications - top eigenspaces for different models have surprisingly high overlap, and top eigenvectors form low rank matrices when they are reshaped into the same shape as the corresponding weight matrix. All of these can be verified empirically for deeper networks. Finally, we use the structure of layer-wise Hessian to get better explicit generalization bounds for neural networks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题