可学习的子空间聚类

论文标题

可学习的子空间聚类

Learnable Subspace Clustering

论文作者

Li, Jun, Liu, Hongfu, Tao, Zhiqiang, Zhao, Handong, Fu, Yun

论文摘要

本文研究了数百万个数据点的大规模子空间聚类（LSSC）问题。许多流行的子空间聚类方法无法直接处理LSSC问题，尽管它们被认为是小规模数据点的最新方法。一个基本原因是，这些方法通常会选择所有数据点作为构建巨大编码模型的大词典，从而导致时间和空间的复杂性。在本文中，我们开发了可学习的子空间聚类范式，以有效地解决LSSC问题。关键思想是学习一个参数函数，以将高维子空间划分为基础低维子空间，而不是经典编码模型的昂贵成本。此外，我们提出了一个统一的鲁棒预测编码机（RPCM）来学习参数函数，可以通过交替的最小化算法来求解。此外，我们提供了参数函数的有界收缩分析。据我们所知，本文是第一项有效地将数百万个数据点集中在子空间聚类方法中的工作。对百万级数据集进行的实验证明，我们的范式在效率和有效性方面均优于相关的最先进方法。

This paper studies the large-scale subspace clustering (LSSC) problem with million data points. Many popular subspace clustering methods cannot directly handle the LSSC problem although they have been considered as state-of-the-art methods for small-scale data points. A basic reason is that these methods often choose all data points as a big dictionary to build huge coding models, which results in a high time and space complexity. In this paper, we develop a learnable subspace clustering paradigm to efficiently solve the LSSC problem. The key idea is to learn a parametric function to partition the high-dimensional subspaces into their underlying low-dimensional subspaces instead of the expensive costs of the classical coding models. Moreover, we propose a unified robust predictive coding machine (RPCM) to learn the parametric function, which can be solved by an alternating minimization algorithm. In addition, we provide a bounded contraction analysis of the parametric function. To the best of our knowledge, this paper is the first work to efficiently cluster millions of data points among the subspace clustering methods. Experiments on million-scale datasets verify that our paradigm outperforms the related state-of-the-art methods in both efficiency and effectiveness.

下载PDF全文

下载文献需遵守相关版权规定

论文标题