论文标题
高维度在高维
Fundamental Limits of Ridge-Regularized Empirical Risk Minimization in High Dimensions
论文作者
论文摘要
经验风险最小化(ERM)算法被广泛用于信号处理和机器学习应用中的各种估计和预测任务中。尽管它们很受欢迎,但一种理论解释了其在现代制度中的统计特性,在现代制度中,测量的数量和未知参数的数量都很大,直到最近才出现。在本文中,我们首次表征了凸的统计准确性的基本限制,用于推断高维广义线性模型。对于具有高斯特征和问题尺寸的样式化设置,以成比例的速率增长大,我们从尖锐的性能特征开始,然后在估计和预测误差上得出紧密的下限,这些估计和预测误差在广泛的损失函数上和正则化参数的任何价值上都存在。我们的精确分析具有多个属性。首先,它导致了最佳调整损耗函数和正则化参数的配方。其次,它允许精确量化流行的启发式选择的亚典型性:例如,我们表明,最佳调整的最小二乘(也许令人惊讶的是)对于标准逻辑数据而言是最佳的,但是随着信号强度的增加,子临时差距会急剧增长。第三,我们使用边界来精确地评估脊定制的优点,这是过度参数比的函数。值得注意的是,我们的界限是根据随机变量的Fisher信息表示的,这些变量是数据分布的简单函数,从而与经典统计中的相应界限保持联系。
Empirical Risk Minimization (ERM) algorithms are widely used in a variety of estimation and prediction tasks in signal-processing and machine learning applications. Despite their popularity, a theory that explains their statistical properties in modern regimes where both the number of measurements and the number of unknown parameters is large is only recently emerging. In this paper, we characterize for the first time the fundamental limits on the statistical accuracy of convex ERM for inference in high-dimensional generalized linear models. For a stylized setting with Gaussian features and problem dimensions that grow large at a proportional rate, we start with sharp performance characterizations and then derive tight lower bounds on the estimation and prediction error that hold over a wide class of loss functions and for any value of the regularization parameter. Our precise analysis has several attributes. First, it leads to a recipe for optimally tuning the loss function and the regularization parameter. Second, it allows to precisely quantify the sub-optimality of popular heuristic choices: for instance, we show that optimally-tuned least-squares is (perhaps surprisingly) approximately optimal for standard logistic data, but the sub-optimality gap grows drastically as the signal strength increases. Third, we use the bounds to precisely assess the merits of ridge-regularization as a function of the over-parameterization ratio. Notably, our bounds are expressed in terms of the Fisher Information of random variables that are simple functions of the data distribution, thus making ties to corresponding bounds in classical statistics.