论文标题
最大值作为无损最大压缩
Margin Maximization as Lossless Maximal Compression
论文作者
论文摘要
监督学习算法的最终目标是生产基于培训数据构建的模型,这些模型可以很好地推广到新示例。在分类中,功能余量最大化(以最大的置信度正确对尽可能多的培训示例进行分类 - 已知可以构建具有良好泛化保证的模型。这项工作给出了在无噪声培训数据集上最大化模型的信息理论解释,因为它可以实现该数据集的无损最大压缩 - 即,从这些功能中提取所有有用的信息以预测标签,而不再是。该连接为监督机器学习中的概括提供了新的见解,显示出更大程度的特殊情况(分类)的利润率最大化,并解释了流行学习算法(如梯度增强)的成功和潜在局限性。我们通过理论论点和经验证据来支持我们的观察,并确定了未来工作的有趣方向。
The ultimate goal of a supervised learning algorithm is to produce models constructed on the training data that can generalize well to new examples. In classification, functional margin maximization -- correctly classifying as many training examples as possible with maximal confidence --has been known to construct models with good generalization guarantees. This work gives an information-theoretic interpretation of a margin maximizing model on a noiseless training dataset as one that achieves lossless maximal compression of said dataset -- i.e. extracts from the features all the useful information for predicting the label and no more. The connection offers new insights on generalization in supervised machine learning, showing margin maximization as a special case (that of classification) of a more general principle and explains the success and potential limitations of popular learning algorithms like gradient boosting. We support our observations with theoretical arguments and empirical evidence and identify interesting directions for future work.