论文标题
关于过多叠层的浅层恢复网络的稀疏性
On Sparsity in Overparametrised Shallow ReLU Networks
论文作者
论文摘要
即使在最简单的单个隐藏层设置中,对神经网络训练的分析仍然是一个杰出的开放问题。无限宽的网络的极限通过均值场景提供了一条吸引人的途径,但一个关键的挑战是将学习保证带回有限的神经元设置,而实际算法运行。 为了缩小这一差距,专注于浅神经网络,在这项工作中,我们研究了不同正则化策略捕获只有有限数量神经元的解决方案的能力,即使在无限广泛的政权上也是如此。具体而言,我们考虑(i)通过将噪声注入训练目标中获得的一种隐式正则化形式[Blanc等人〜19],以及(ii)变异 - 标准正则化[Bach〜17],与平均场尺度兼容。在对激活函数的轻度假设(例如,用reles满足),我们确定两种方案都通过仅具有有限数量神经元的函数最小化,而不论过多散热量的量如何。我们研究了这种财产的后果,并描述了一种形式的正则化形式比另一种形式的设置。
The analysis of neural network training beyond their linearization regime remains an outstanding open question, even in the simplest setup of a single hidden-layer. The limit of infinitely wide networks provides an appealing route forward through the mean-field perspective, but a key challenge is to bring learning guarantees back to the finite-neuron setting, where practical algorithms operate. Towards closing this gap, and focusing on shallow neural networks, in this work we study the ability of different regularisation strategies to capture solutions requiring only a finite amount of neurons, even on the infinitely wide regime. Specifically, we consider (i) a form of implicit regularisation obtained by injecting noise into training targets [Blanc et al.~19], and (ii) the variation-norm regularisation [Bach~17], compatible with the mean-field scaling. Under mild assumptions on the activation function (satisfied for instance with ReLUs), we establish that both schemes are minimised by functions having only a finite number of neurons, irrespective of the amount of overparametrisation. We study the consequences of such property and describe the settings where one form of regularisation is favorable over the other.