论文标题

扣子 - 无参数时间序列分割

ClaSP -- Parameter-free Time Series Segmentation

论文作者

Ermshaus, Arik, Schäfer, Patrick, Leser, Ulf

论文摘要

自然和人制过程的研究通常会导致长时间有序的值(又称时间序列(TS))的长序列。这样的过程通常由多个状态组成,例如机器的操作模式,使观测过程中的状态变化会导致测量值形状的分布变化。时间序列分割(TSS)试图发现TS事后的这种变化,以推断数据生成过程的变化。通常将TSS视为无监督的学习问题,目的是识别某些统计属性可区分的细分。 TSS的当前算法要求用户设置依赖域的超参数,对TS值分布进行假设或可检测到的更改的类型,以限制其适用性。常见的超参数是段均匀性和变更点的数量的度量,对于每个数据集,这尤其难以调节。我们提出了TSS的一种新颖,高度准确的,高度准确的,无参数的和域的无关方法。 clasp层次将TS分为两个部分。更改点是通过针对每个可能的分离点训练二进制TS分类器来确定的,并选择最能识别从任何一个分区的子序列的一个分裂。 Clasp使用两种新颖的定制算法从数据中学习了主要的两个模型参数。在使用107个数据集的基准测试的实验评估中,我们表明,扣子在准确性方面优于最新技术,并且是快速且可扩展的。此外,我们使用几个现实世界的案例研究强调了扣子的特性。

The study of natural and human-made processes often results in long sequences of temporally-ordered values, aka time series (TS). Such processes often consist of multiple states, e.g. operating modes of a machine, such that state changes in the observed processes result in changes in the distribution of shape of the measured values. Time series segmentation (TSS) tries to find such changes in TS post-hoc to deduce changes in the data-generating process. TSS is typically approached as an unsupervised learning problem aiming at the identification of segments distinguishable by some statistical property. Current algorithms for TSS require domain-dependent hyper-parameters to be set by the user, make assumptions about the TS value distribution or the types of detectable changes which limits their applicability. Common hyperparameters are the measure of segment homogeneity and the number of change points, which are particularly hard to tune for each data set. We present ClaSP, a novel, highly accurate, hyper-parameter-free and domain-agnostic method for TSS. ClaSP hierarchically splits a TS into two parts. A change point is determined by training a binary TS classifier for each possible split point and selecting the one split that is best at identifying subsequences to be from either of the partitions. ClaSP learns its main two model-parameters from the data using two novel bespoke algorithms. In our experimental evaluation using a benchmark of 107 data sets, we show that ClaSP outperforms the state of the art in terms of accuracy and is fast and scalable. Furthermore, we highlight properties of ClaSP using several real-world case studies.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源