论文标题
高度参数化的神经网络是否概括,因为不良解决方案很少?
Do highly over-parameterized neural networks generalize since bad solutions are rare?
论文作者
论文摘要
我们研究了过度参数化的分类器,其中经验风险最小化(ERM)用于学习导致零训练错误。在这些过度参数化的设置中,有许多全球最小值,其中有零训练错误,其中一些概括比其他训练更好。我们表明,在某些条件下,“不良”全局最小值的比例大于ε衰减,而训练数据的数量大于ε衰减至零。界限取决于用于给定分类问题的分类器函数集的真实误差的分布,并且不一定取决于分类器函数集的大小或复杂性(例如,参数数)。这种见识可能会提供一些新颖的观点,即即使是高度过度参数化的神经网络的出乎意料的良好概括。我们通过实验合成数据和MNIST子集来证实我们的理论发现。此外,我们在CalTech101的子集上使用VGG19和RESNET18评估了我们的假设。
We study over-parameterized classifiers where Empirical Risk Minimization (ERM) for learning leads to zero training error. In these over-parameterized settings there are many global minima with zero training error, some of which generalize better than others. We show that under certain conditions the fraction of "bad" global minima with a true error larger than ε decays to zero exponentially fast with the number of training data n. The bound depends on the distribution of the true error over the set of classifier functions used for the given classification problem, and does not necessarily depend on the size or complexity (e.g. the number of parameters) of the classifier function set. This insight may provide a novel perspective on the unexpectedly good generalization even of highly over-parameterized neural networks. We substantiate our theoretical findings through experiments on synthetic data and a subset of MNIST. Additionally, we assess our hypothesis using VGG19 and ResNet18 on a subset of Caltech101.