结构化数据的统计学习理论

论文标题

结构化数据的统计学习理论

Statistical learning theory of structured data

论文作者

Pastore, Mauro, Rotondo, Pietro, Erba, Vittorio, Gherardi, Marco

论文摘要

统计物理学对监督学习的传统方法通常会为数据提供不切实际的生成模型：通常输入是独立的随机变量，与标签无关。直到最近，统计物理学家才开始探索更复杂的数据形式，例如位于（可能是低维）对象歧管上的同样标记的点。在这里，我们提供了这个最近建立的研究领域与统计学习理论的框架之间的桥梁，统计学习理论是用于推断机器学习的数学分支。总体动机是经典严格的结果不足以解释深度学习的显着概括。我们提出了一种将数据的物理模型集成到统计学习理论中的方法，并将其与组合和统计力学方法（计算Vapnik-Chervonenkis熵的计算）介绍，该方法计算了与损失类别兼容的不同二元分类的数量。作为概念的证明，我们专注于内核机器以及最近物理文献中引入的两个数据结构的简单实现：具有规定的几何关系和球形流形的$ k $ - 二维单纯胶（等效于保证金分类）。与严格的界限相比，与非结构化数据发生的情况相反，与非结构化数据发生的情况相反。此外，数据结构导致了超出存储能力的新型过渡，我们认为这是对非单调性的代表，最终是低概括误差的提示。在过渡时消失的突触体积消失的鉴定允许量化数据结构在复制理论中的影响，适用于没有组合方法的情况下，正如我们所证明的用于保证金学习的那样。

The traditional approach of statistical physics to supervised learning routinely assumes unrealistic generative models for the data: usually inputs are independent random variables, uncorrelated with their labels. Only recently, statistical physicists started to explore more complex forms of data, such as equally-labelled points lying on (possibly low dimensional) object manifolds. Here we provide a bridge between this recently-established research area and the framework of statistical learning theory, a branch of mathematics devoted to inference in machine learning. The overarching motivation is the inadequacy of the classic rigorous results in explaining the remarkable generalization properties of deep learning. We propose a way to integrate physical models of data into statistical learning theory, and address, with both combinatorial and statistical mechanics methods, the computation of the Vapnik-Chervonenkis entropy, which counts the number of different binary classifications compatible with the loss class. As a proof of concept, we focus on kernel machines and on two simple realizations of data structure introduced in recent physics literature: $k$-dimensional simplexes with prescribed geometric relations and spherical manifolds (equivalent to margin classification). Entropy, contrary to what happens for unstructured data, is nonmonotonic in the sample size, in contrast with the rigorous bounds. Moreover, data structure induces a novel transition beyond the storage capacity, which we advocate as a proxy of the nonmonotonicity, and ultimately a cue of low generalization error. The identification of a synaptic volume vanishing at the transition allows a quantification of the impact of data structure within replica theory, applicable in cases where combinatorial methods are not available, as we demonstrate for margin learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题