$ \ ell_1 $ regularized优化的基于矫正的近端随机梯度方法

论文标题

$ \ ell_1 $ regularized优化的基于矫正的近端随机梯度方法

Orthant Based Proximal Stochastic Gradient Method for $\ell_1$-Regularized Optimization

论文作者

Chen, Tianyi, Ding, Tianyu, Ji, Bo, Wang, Guanyi, Tian, Jing, Shi, Yixin, Yi, Sheng, Tu, Xiao, Zhu, Zhihui

论文摘要

引起稀疏性的正则化问题在机器学习应用中无处不在，从特征选择到模型压缩。在本文中，我们提出了一种新颖的随机方法 - 基于矫正的近端随机梯度方法（OBPROX-SG），以解决也许最受欢迎的实例，即L1调查问题。 Obprox-SG方法包含两个步骤：（i）近端随机梯度步骤，以预测溶液的支撑盖；（ii）通过矫形式投影积极地增强稀疏度的矫正步骤。与最先进的方法相比，例如Prox-SG，RDA和Prox-SVRG，Obprox-SG不仅会收敛到全球最佳解决方案（在凸情景中）或固定点（在非convex方案中），而且还基本上促进了解决方案的稀疏性。特别是，在许多凸问题上，Obprox-SG在稀疏探索和客观价值方面全面优于现有方法。此外，在非凸vex深神经网络（例如Mobilenetv1和Resnet18）上进行的实验进一步证明了它的优势，而无需牺牲概括的准确性，可以实现更高的稀疏性解决方案。

Sparsity-inducing regularization problems are ubiquitous in machine learning applications, ranging from feature selection to model compression. In this paper, we present a novel stochastic method -- Orthant Based Proximal Stochastic Gradient Method (OBProx-SG) -- to solve perhaps the most popular instance, i.e., the l1-regularized problem. The OBProx-SG method contains two steps: (i) a proximal stochastic gradient step to predict a support cover of the solution; and (ii) an orthant step to aggressively enhance the sparsity level via orthant face projection. Compared to the state-of-the-art methods, e.g., Prox-SG, RDA and Prox-SVRG, the OBProx-SG not only converges to the global optimal solutions (in convex scenario) or the stationary points (in non-convex scenario), but also promotes the sparsity of the solutions substantially. Particularly, on a large number of convex problems, OBProx-SG outperforms the existing methods comprehensively in the aspect of sparsity exploration and objective values. Moreover, the experiments on non-convex deep neural networks, e.g., MobileNetV1 and ResNet18, further demonstrate its superiority by achieving the solutions of much higher sparsity without sacrificing generalization accuracy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题