好的分类器在插值方面很丰富

论文标题

好的分类器在插值方面很丰富

Good Classifiers are Abundant in the Interpolating Regime

论文作者

Theisen, Ryan, Klusowski, Jason M., Mahoney, Michael W.

论文摘要

在机器学习社区中，广泛使用的统一收敛框架已被用来回答以下问题：复杂，过度参数化的模型如何将其概括为新数据。这种方法界定了最坏情况模型的测试误差可能适合数据，但它具有根本的局限性。受统计力学学习方法的启发，我们正式定义并开发了一种方法，以精确计算来自几个模型类的插值分类器之间的测试错误的完整分布。我们使用我们的方法来计算几个真实和合成数据集的分布，并具有线性和随机特征分类模型。我们发现测试错误倾向于集中在一个小的典型值$ \ varepsilon^*$上，该值与同一数据集上最糟糕的案例插值模型的测试误差显着偏离，这表明“不良”分类器极为罕见。我们在简单的环境中提供了理论结果，在该环境中，我们表征了测试错误的完整渐近分布，我们表明这些确实集中在一个值$ \ varepsilon^*$上，我们也可以准确地确定。然后，我们对我们的经验发现支持的更一般的猜想形式化。我们的结果表明，统计学习理论中通常的分析风格可能不足以捕获实践中观察到的良好的概括性能，并且基于学习统计机制的方法可能提供了有希望的选择。

Within the machine learning community, the widely-used uniform convergence framework has been used to answer the question of how complex, over-parameterized models can generalize well to new data. This approach bounds the test error of the worst-case model one could have fit to the data, but it has fundamental limitations. Inspired by the statistical mechanics approach to learning, we formally define and develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers from several model classes. We apply our method to compute this distribution for several real and synthetic datasets, with both linear and random feature classification models. We find that test errors tend to concentrate around a small typical value $\varepsilon^*$, which deviates substantially from the test error of the worst-case interpolating model on the same datasets, indicating that "bad" classifiers are extremely rare. We provide theoretical results in a simple setting in which we characterize the full asymptotic distribution of test errors, and we show that these indeed concentrate around a value $\varepsilon^*$, which we also identify exactly. We then formalize a more general conjecture supported by our empirical findings. Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice, and that approaches based on the statistical mechanics of learning may offer a promising alternative.

下载PDF全文

下载文献需遵守相关版权规定

论文标题