论文标题
约翰逊 - 林登斯特劳斯的引理聚类和子空间近似:从降低尺寸。
The Johnson-Lindenstrauss Lemma for Clustering and Subspace Approximation: From Coresets to Dimension Reduction
论文作者
论文摘要
我们研究了约翰逊·林顿斯(Johnson-Lindenstrauss)在各种投射聚类问题中转变的影响,从而推广了仅适用于中心聚类的最新结果[MMR19]。我们提出一个总体问题:对于欧几里得优化问题和准确的参数$ε\ in(0,1)$,什么是最小的目标维度$ t \ in \ mathbb {n} $,以至于约翰逊·林斯特拉斯(Johnson-Lindenstrauss $(1+ε)$ - 因子。我们提供了一种新技术,该技术使用核心结构来分析Johnson-Lindenstrauss变换的效果。除了适用于基于中心的聚类,我们的技术还改善了(或第一个解决)其他欧几里得优化问题,包括: $ \ bullet $ for $(k,z)$ - 子空间近似:我们表明$ t = \ tilde {o}(zk^2 /ε^3)$足够,而先前的最佳绑定为$ O(k /ε^2)$,仅适用于case $ z = 2 $ z = 2 $ [CEMMP15]。 $ \ bullet $ for $(k,z)$ - 平面近似值:我们显示$ t = \ tilde {o}(zk^2/ε^3)$足够,完全消除了先前绑定的$ \ tilde {o}(zk^2 \ log n/ε^3)$ [kkk^2 \ log n/ε^3)$的依赖。 $ \ bullet $ for $(k,z)$ - 行近似:我们显示$ t = o(((k \ log \ log \ log \ log n + z + \ log(1 /ε)) /ε^3)$足够,而我们的是第一个给出任何降低尺寸的结果。
We study the effect of Johnson-Lindenstrauss transforms in various projective clustering problems, generalizing recent results which only applied to center-based clustering [MMR19]. We ask the general question: for a Euclidean optimization problem and an accuracy parameter $ε\in (0, 1)$, what is the smallest target dimension $t \in \mathbb{N}$ such that a Johnson-Lindenstrauss transform $Π\colon \mathbb{R}^d \to \mathbb{R}^t$ preserves the cost of the optimal solution up to a $(1+ε)$-factor. We give a new technique which uses coreset constructions to analyze the effect of the Johnson-Lindenstrauss transform. Our technique, in addition applying to center-based clustering, improves on (or is the first to address) other Euclidean optimization problems, including: $\bullet$ For $(k,z)$-subspace approximation: we show that $t = \tilde{O}(zk^2 / ε^3)$ suffices, whereas the prior best bound, of $O(k/ε^2)$, only applied to the case $z = 2$ [CEMMP15]. $\bullet$ For $(k,z)$-flat approximation: we show $t = \tilde{O}(zk^2/ε^3)$ suffices, completely removing the dependence on $n$ from the prior bound $\tilde{O}(zk^2 \log n/ε^3)$ of [KR15]. $\bullet$ For $(k,z)$-line approximation: we show $t = O((k \log \log n + z + \log(1/ε)) / ε^3)$ suffices, and ours is the first to give any dimension reduction result.