论文标题
过度参数:推断模型的必要条件
Over-parameterization: A Necessary Condition for Models that Extrapolate
论文作者
论文摘要
在这项工作中,我们将过度参数化是具有模型在训练集凸面之外推断的能力的必要条件。具体来说,我们考虑分类模型,例如图像分类和深度学习的其他应用。这样的模型是分类函数,可将其域分开并为每个分区分配一个类\ cite {strang2019linear}。分区由决策边界定义,分类模型/函数也是如此。训练组的凸壳可能仅占据域的一个子集,但是训练有素的模型可以分区整个领域,而不仅仅是训练集的凸壳。这很重要,因为许多测试样本可能不在训练集的凸壳之外,以及模型在凸面外部将其域划分的方式在其概括方面具有影响力。使用近似理论,我们证明过度参数化是控制训练集凸壳之外的域外部分配的必要条件。我们还提出了一个更明确的定义,以根据学习任务和手头的培训对过度参数化的概念提出了更明确的定义。我们提供了有关图像和非图像的数据集的几何形状的经验证据,以提供有关模型执行的外推程度的见解。我们考虑一个由重新网络模型学到的64维特征空间,并研究了该空间中凸壳和决策边界的几何布置。我们还形式化了外推的概念,并将其与模型范围联系起来。最后,我们回顾了纯净和应用数学的丰富推断文献,例如惠特尼的扩展问题,并将我们的理论置于这种情况下。
In this work, we study over-parameterization as a necessary condition for having the ability for the models to extrapolate outside the convex hull of training set. We specifically, consider classification models, e.g., image classification and other applications of deep learning. Such models are classification functions that partition their domain and assign a class to each partition \cite{strang2019linear}. Partitions are defined by decision boundaries and so is the classification model/function. Convex hull of training set may occupy only a subset of the domain, but trained model may partition the entire domain and not just the convex hull of training set. This is important because many of the testing samples may be outside the convex hull of training set and the way in which a model partitions its domain outside the convex hull would be influential in its generalization. Using approximation theory, we prove that over-parameterization is a necessary condition for having control over the partitioning of the domain outside the convex hull of training set. We also propose a more clear definition for the notion of over-parametrization based on the learning task and the training set at hand. We present empirical evidence about geometry of datasets, both image and non-image, to provide insights about the extent of extrapolation performed by the models. We consider a 64-dimensional feature space learned by a ResNet model and investigate the geometric arrangements of convex hulls and decision boundaries in that space. We also formalize the notion of extrapolation and relate it to the scope of the model. Finally, we review the rich extrapolation literature in pure and applied mathematics, e.g., the Whitney's Extension Problem, and place our theory in that context.