论文标题
对平均度量和统计学习应用的最佳量化
Optimal quantization of the mean measure and applications to statistical learning
论文作者
论文摘要
本文介绍了数据作为点集或更普遍地作为离散度量的情况。我们的动机是双重的:首先,我们打算以紧凑的量度测量量度生成过程的平均值进行近似,这与点过程框架中的强度度量相吻合,或者与基于持久性拓扑数据分析框架中的预期持久图相吻合。为了这个目的,我们提供了两种算法,这些算法几乎是最佳的。 其次,我们从平均值量度矢量化图的估计器中构建,该图将每个度量发送到有限的欧几里得空间中,并通过面向聚类的镜头研究其属性。简而言之,我们表明,在测量生成过程的混合物中,我们的技术在$ \ mathbb {r}^k $中产生代表,对于$ k \ in \ mathbb {n}^*$,可以保证具有很高可能性的数据点。有趣的是,我们的结果适用于基于持续的形状分类的框架,通过\ cite {royer19}中描述的ATOL过程。
This paper addresses the case where data come as point sets, or more generally as discrete measures. Our motivation is twofold: first we intend to approximate with a compactly supported measure the mean of the measure generating process, that coincides with the intensity measure in the point process framework, or with the expected persistence diagram in the framework of persistence-based topological data analysis. To this aim we provide two algorithms that we prove almost minimax optimal. Second we build from the estimator of the mean measure a vectorization map, that sends every measure into a finite-dimensional Euclidean space, and investigate its properties through a clustering-oriented lens. In a nutshell, we show that in a mixture of measure generating process, our technique yields a representation in $\mathbb{R}^k$, for $k \in \mathbb{N}^*$ that guarantees a good clustering of the data points with high probability. Interestingly, our results apply in the framework of persistence-based shape classification via the ATOL procedure described in \cite{Royer19}.