广泛卷积神经网络的渐近学

论文标题

广泛卷积神经网络的渐近学

Asymptotics of Wide Convolutional Neural Networks

论文作者

Andreassen, Anders, Dyer, Ethan

论文摘要

广泛的神经网络已被证明是理论和实践的丰富架构。在观察到有限宽度卷积网络似乎胜过无限宽度网络的动机中，我们研究了具有跳过连接的宽CNN和网络的规模定律。遵循（Dyer＆Gur-ari，2019年）的方法，我们提出了一种简单的图表配方，以导致许多关注的渐近宽度依赖性。这些扩展关系为广泛卷积网络的训练动力学提供了可解决的描述。我们在广泛的体系结构中测试这些关系。特别是，我们发现有限宽度模型之间的性能差异在模型宽度方面以明确的速度消失。但是，这种关系与比无限宽度对应的有限宽度模型一致，我们提供了相对性能取决于优化细节的示例。

Wide neural networks have proven to be a rich class of architectures for both theory and practice. Motivated by the observation that finite width convolutional networks appear to outperform infinite width networks, we study scaling laws for wide CNNs and networks with skip connections. Following the approach of (Dyer & Gur-Ari, 2019), we present a simple diagrammatic recipe to derive the asymptotic width dependence for many quantities of interest. These scaling relationships provide a solvable description for the training dynamics of wide convolutional networks. We test these relations across a broad range of architectures. In particular, we find that the difference in performance between finite and infinite width models vanishes at a definite rate with respect to model width. Nonetheless, this relation is consistent with finite width models generalizing either better or worse than their infinite width counterparts, and we provide examples where the relative performance depends on the optimization details.

下载PDF全文

下载文献需遵守相关版权规定

论文标题