论文标题
用于提升和最小的精确的高维渐近理论-ULL_1 $ -NOMM内插分类器
A Precise High-Dimensional Asymptotic Theory for Boosting and Minimum-$\ell_1$-Norm Interpolated Classifiers
论文作者
论文摘要
本文建立了一种精确的高维渐近理论,用于促进可分离的数据,采用统计和计算观点。我们考虑一个高维设置,其中功能数量(弱学习者)$ p $ scales具有样本尺寸$ n $,在过份术中。在一类统计模型下,我们提供了对算法插入训练数据并最大化经验$ \ ell_1 $ -margin时提升的概括误差的精确分析。此外,我们明确固定了增强测试误差与最佳贝叶斯误差之间的关系,以及插值时的活动特征的比例(零初始化)。反过来,这些精确的特征回答了在假定的数据生成过程中,围绕增强的\ cite {breiman1999prediction,schapire1998boosting}提出的某些问题。我们理论的核心是对最大 - $ \ ell_1 $ -margin的深入研究,可以通过新的非线性方程式进行精确描述。为了分析这个边缘,我们依靠高斯比较技术并发展出新的统一偏差论点。我们的统计和计算参数可以处理(1)用于特征分布的任何有限级尖峰协方差模型,以及(2)增强的变体,对应于常规$ \ ell_q $ - 几何,$ q \ in [1,2] $。作为最终组成部分,通过Lindeberg原理,我们建立了一个普遍性结果,表明缩放的$ \ ell_1 $ -margin(渐近地)保持不变,无论是用于促进的协变量来自非线性随机特征模型还是适当的线性化模型,无论是用于增强的协变量。
This paper establishes a precise high-dimensional asymptotic theory for boosting on separable data, taking statistical and computational perspectives. We consider a high-dimensional setting where the number of features (weak learners) $p$ scales with the sample size $n$, in an overparametrized regime. Under a class of statistical models, we provide an exact analysis of the generalization error of boosting when the algorithm interpolates the training data and maximizes the empirical $\ell_1$-margin. Further, we explicitly pin down the relation between the boosting test error and the optimal Bayes error, as well as the proportion of active features at interpolation (with zero initialization). In turn, these precise characterizations answer certain questions raised in \cite{breiman1999prediction, schapire1998boosting} surrounding boosting, under assumed data generating processes. At the heart of our theory lies an in-depth study of the maximum-$\ell_1$-margin, which can be accurately described by a new system of non-linear equations; to analyze this margin, we rely on Gaussian comparison techniques and develop a novel uniform deviation argument. Our statistical and computational arguments can handle (1) any finite-rank spiked covariance model for the feature distribution and (2) variants of boosting corresponding to general $\ell_q$-geometry, $q \in [1, 2]$. As a final component, via the Lindeberg principle, we establish a universality result showcasing that the scaled $\ell_1$-margin (asymptotically) remains the same, whether the covariates used for boosting arise from a non-linear random feature model or an appropriately linearized model with matching moments.