论文标题
统计自适应随机梯度方法
Statistical Adaptive Stochastic Gradient Methods
论文作者
论文摘要
我们提出了一种称为SALSA的统计自适应程序,用于在随机梯度方法中自动安排学习率(步长)。 Salsa首先使用平滑的随机线路搜索程序来逐渐提高学习率,然后自动切换到统计方法以降低学习率。线路搜索过程````温暖''的优化过程,从而减少了对设定初始学习率的昂贵反复试验的需求。降低学习率的方法是基于一种新的统计测试,用于检测常数步骤大小时检测平稳性。与先前的工作不同,我们的测试适用于无需修改的一系列随机梯度算法。合并的方法非常强大且自主,它与我们在几个深度学习任务的实验中相匹配的最佳手动学习率时间表的性能。
We propose a statistical adaptive procedure called SALSA for automatically scheduling the learning rate (step size) in stochastic gradient methods. SALSA first uses a smoothed stochastic line-search procedure to gradually increase the learning rate, then automatically switches to a statistical method to decrease the learning rate. The line search procedure ``warms up'' the optimization process, reducing the need for expensive trial and error in setting an initial learning rate. The method for decreasing the learning rate is based on a new statistical test for detecting stationarity when using a constant step size. Unlike in prior work, our test applies to a broad class of stochastic gradient algorithms without modification. The combined method is highly robust and autonomous, and it matches the performance of the best hand-tuned learning rate schedules in our experiments on several deep learning tasks.