论文标题

$ \ ell_1 $ regularized优化的基于矫正的近端随机梯度方法

Orthant Based Proximal Stochastic Gradient Method for $\ell_1$-Regularized Optimization

论文作者

Chen, Tianyi, Ding, Tianyu, Ji, Bo, Wang, Guanyi, Tian, Jing, Shi, Yixin, Yi, Sheng, Tu, Xiao, Zhu, Zhihui

论文摘要

引起稀疏性的正则化问题在机器学习应用中无处不在,从特征选择到模型压缩。在本文中,我们提出了一种新颖的随机方法 - 基于矫正的近端随机梯度方法(OBPROX-SG),以解决也许最受欢迎的实例,即L1调查问题。 Obprox-SG方法包含两个步骤:(i)近端随机梯度步骤,以预测溶液的支撑盖; (ii)通过矫形式投影积极地增强稀疏度的矫正步骤。与最先进的方法相比,例如Prox-SG,RDA和Prox-SVRG,Obprox-SG不仅会收敛到全球最佳解决方案(在凸情景中)或固定点(在非convex方案中),而且还基本上促进了解决方案的稀疏性。特别是,在许多凸问题上,Obprox-SG在稀疏探索和客观价值方面全面优于现有方法。此外,在非凸vex深神经网络(例如Mobilenetv1和Resnet18)上进行的实验进一步证明了它的优势,而无需牺牲概括的准确性,可以实现更高的稀疏性解决方案。

Sparsity-inducing regularization problems are ubiquitous in machine learning applications, ranging from feature selection to model compression. In this paper, we present a novel stochastic method -- Orthant Based Proximal Stochastic Gradient Method (OBProx-SG) -- to solve perhaps the most popular instance, i.e., the l1-regularized problem. The OBProx-SG method contains two steps: (i) a proximal stochastic gradient step to predict a support cover of the solution; and (ii) an orthant step to aggressively enhance the sparsity level via orthant face projection. Compared to the state-of-the-art methods, e.g., Prox-SG, RDA and Prox-SVRG, the OBProx-SG not only converges to the global optimal solutions (in convex scenario) or the stationary points (in non-convex scenario), but also promotes the sparsity of the solutions substantially. Particularly, on a large number of convex problems, OBProx-SG outperforms the existing methods comprehensively in the aspect of sparsity exploration and objective values. Moreover, the experiments on non-convex deep neural networks, e.g., MobileNetV1 and ResNet18, further demonstrate its superiority by achieving the solutions of much higher sparsity without sacrificing generalization accuracy.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源