时间序列集群的基准研究

论文标题

时间序列集群的基准研究

A Benchmark Study on Time Series Clustering

论文作者

Javed, Ali, Lee, Byung Suk, Rizzo, Dona M.

论文摘要

本文提出了第一次使用加利福尼亚大学河滨大学（UCR）档案中可用的所有时间序列数据集的首次序列聚类基准，这是时间序列数据的最先进的存储库。具体而言，该基准测试了八种流行聚类方法，这些方法代表三类聚类算法（基于分区，层次和密度）和三种类型的距离测量方法（欧几里得，动态时间扭曲和基于形状）。我们布置了六个限制，特别注意使基准尽可能公正。然后，设计了一种分阶段的评估方法，用于总结数据集级别的评估指标并讨论结果。提出的基准研究可以自行成为研究界的有用参考。报告的数据集级评估指标可用于设计评估框架以回答不同的研究问题。

This paper presents the first time series clustering benchmark utilizing all time series datasets currently available in the University of California Riverside (UCR) archive -- the state of the art repository of time series data. Specifically, the benchmark examines eight popular clustering methods representing three categories of clustering algorithms (partitional, hierarchical and density-based) and three types of distance measures (Euclidean, dynamic time warping, and shape-based). We lay out six restrictions with special attention to making the benchmark as unbiased as possible. A phased evaluation approach was then designed for summarizing dataset-level assessment metrics and discussing the results. The benchmark study presented can be a useful reference for the research community on its own; and the dataset-level assessment metrics reported may be used for designing evaluation frameworks to answer different research questions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题