随机高裁缝的收敛特性

论文标题

随机高裁缝的收敛特性

Convergence Properties of Stochastic Hypergradients

论文作者

Grazzi, Riccardo, Pontil, Massimiliano, Salzo, Saverio

论文摘要

二重的优化问题在机器学习中引起了越来越多的关注，因为它们为超参数优化和元学习提供了自然框架。解决这些问题的关键步骤是对高级目标（高度级别）梯度的有效计算。在这项工作中，我们研究了高度分析的随机近似方案，当低级问题是在大型数据集上的经验风险最小化时，这一点很重要。我们提出的方法是（Pedregosa，2016）中近似隐式分化方法的随机变体。在假设下级别的问题仅通过随机映射可以访问，这是预期的收缩。特别是，我们的主要界限不可知对该程序采用的两个随机求解器的选择。我们提供数值实验来支持我们的理论分析，并显示在实践中使用随机高降解的优势。

Bilevel optimization problems are receiving increasing attention in machine learning as they provide a natural framework for hyperparameter optimization and meta-learning. A key step to tackle these problems is the efficient computation of the gradient of the upper-level objective (hypergradient). In this work, we study stochastic approximation schemes for the hypergradient, which are important when the lower-level problem is empirical risk minimization on a large dataset. The method that we propose is a stochastic variant of the approximate implicit differentiation approach in (Pedregosa, 2016). We provide bounds for the mean square error of the hypergradient approximation, under the assumption that the lower-level problem is accessible only through a stochastic mapping which is a contraction in expectation. In particular, our main bound is agnostic to the choice of the two stochastic solvers employed by the procedure. We provide numerical experiments to support our theoretical analysis and to show the advantage of using stochastic hypergradients in practice.

下载PDF全文

下载文献需遵守相关版权规定

论文标题