论文标题
矩阵游戏的坐标方法
Coordinate Methods for Matrix Games
论文作者
论文摘要
我们开发了用于求解表单$ \ min_ {x \ in \ Mathcal {x}} \ max_ {y max_ {y in \ Mathcal {y}} y^\ top X $的X $的X $的原始双坐标方法$ \ min_ {x \ in \ mathcal {x}} \ max_ {y \ in \ max_ {x \ in \ mathcal {x}} $,这些X $包含线性编程,分类,分类和专门的回归。我们的方法将现有的完全随机的均方根方法和方差减少方法推向了它们的限制复杂性和样品复杂性方面的限制。我们通过设计有效的数据结构来利用Taylor近似值和二项式堆,从而获得几乎构恒定的触电复杂性。我们使用依赖于矩阵条目的迭代和幅度的动态采样分布来通过低变化梯度估计器提高样品复杂性。 我们的运行时界限通过一个因素改善了现有的原始偶二方法,具体取决于$ m $ $ n $矩阵$ a $的稀疏度度量。例如,当行和列具有常数$ \ ell_1/\ ell_2 $ norm比率时,我们在完全随机设置中提供的改进为$ m+n $,而$ \ sqrt {m+n} $在变异降低的设置中。我们将方法应用于计算几何问题,即最小封闭球,最大铭刻球和线性回归,并获得改善的复杂性界限。对于使用元素的非负矩阵的线性回归,我们的保证在确切的梯度方法上提高了$ \ sqrt {\ mathrm {nnz}(a)/(m+n)} $的倍数。
We develop primal-dual coordinate methods for solving bilinear saddle-point problems of the form $\min_{x \in \mathcal{X}} \max_{y\in\mathcal{Y}} y^\top A x$ which contain linear programming, classification, and regression as special cases. Our methods push existing fully stochastic sublinear methods and variance-reduced methods towards their limits in terms of per-iteration complexity and sample complexity. We obtain nearly-constant per-iteration complexity by designing efficient data structures leveraging Taylor approximations to the exponential and a binomial heap. We improve sample complexity via low-variance gradient estimators using dynamic sampling distributions that depend on both the iterates and the magnitude of the matrix entries. Our runtime bounds improve upon those of existing primal-dual methods by a factor depending on sparsity measures of the $m$ by $n$ matrix $A$. For example, when rows and columns have constant $\ell_1/\ell_2$ norm ratios, we offer improvements by a factor of $m+n$ in the fully stochastic setting and $\sqrt{m+n}$ in the variance-reduced setting. We apply our methods to computational geometry problems, i.e. minimum enclosing ball, maximum inscribed ball, and linear regression, and obtain improved complexity bounds. For linear regression with an elementwise nonnegative matrix, our guarantees improve on exact gradient methods by a factor of $\sqrt{\mathrm{nnz}(A)/(m+n)}$.