一般的随机近端梯度方法，用于深度学习

论文标题

一般的随机近端梯度方法，用于深度学习

A General Family of Stochastic Proximal Gradient Methods for Deep Learning

论文作者

Yun, Jihun, Lozano, Aurelie C., Yang, Eunho

论文摘要

我们研究正规器可能是非平滑和非凸的正规化神经网络的培训。我们提出了一个统一的随机近端梯度下降的框架，我们将其称为Proxgen，该框架允许任意阳性预调节器和较低的半连续正规化器。我们的框架包括标准随机近端梯度方法，而没有预处理器作为特殊情况，这些方法已在各种设置中进行了广泛研究。不仅如此，我们提出了超越众所周知的标准方法的两个重要更新规则，作为我们方法的副产品：（i）自适应随机梯度方法的第一个封闭形式的近端映射（$ \ ell_q $正常化（$ 0 \ leq Q \ leq 1 $））用于自适应随机梯度方法，以及（ii）定量定量的修订版本，用于定量的定量量。我们分析了Proxgen的收敛性，并表明整个Proxgen家族的收敛速度与没有预处理的随机近端梯度下降相同。通过广泛的实验，我们还从经验上表明了近端方法的优越性。有趣的是，我们的结果表明，具有非凸正则化器的近端方法比具有凸正则化的人更有效。

We study the training of regularized neural networks where the regularizer can be non-smooth and non-convex. We propose a unified framework for stochastic proximal gradient descent, which we term ProxGen, that allows for arbitrary positive preconditioners and lower semi-continuous regularizers. Our framework encompasses standard stochastic proximal gradient methods without preconditioners as special cases, which have been extensively studied in various settings. Not only that, we present two important update rules beyond the well-known standard methods as a byproduct of our approach: (i) the first closed-form proximal mappings of $\ell_q$ regularization ($0 \leq q \leq 1$) for adaptive stochastic gradient methods, and (ii) a revised version of ProxQuant that fixes a caveat of the original approach for quantization-specific regularizers. We analyze the convergence of ProxGen and show that the whole family of ProxGen enjoys the same convergence rate as stochastic proximal gradient descent without preconditioners. We also empirically show the superiority of proximal methods compared to subgradient-based approaches via extensive experiments. Interestingly, our results indicate that proximal methods with non-convex regularizers are more effective than those with convex regularizers.

下载PDF全文

下载文献需遵守相关版权规定

论文标题