论文标题
时间序列数据清洁:从异常检测到异常修复(技术报告)
Time Series Data Cleaning: From Anomaly Detection to Anomaly Repairing (Technical Report)
论文作者
论文摘要
错误在时间序列数据中普遍存在,例如GPS轨迹或传感器读数。现有方法更多地侧重于异常检测,而不是修复检测到的异常。通过简单地通过异常检测过滤脏数据,在不完整的时间序列中,应用程序仍然可能不可靠。我们建议(迭代地)在时间序列数据中(迭代)修复它们,而不是简单地丢弃异常,而是通过创造性地将其在异常检测中的时代性美与数据修复中被广泛考虑的最小变化原则结合起来。我们的主要贡献包括:(1)在时间序列数据上的迭代最小修复(IMR)的新框架,(2)关于拟议的迭代最小修复的收敛性的明确分析,以及(3)每次迭代中参数的有效估计。值得注意的是,通过增量计算,我们将参数估计的复杂性从O(N)降低到O(1)。与最先进的方法相比,实际数据集上的实验证明了我们的提案的优势。特别是,我们表明(拟议的)维修确实改善了时间序列分类的应用。
Errors are prevalent in time series data, such as GPS trajectories or sensor readings. Existing methods focus more on anomaly detection but not on repairing the detected anomalies. By simply filtering out the dirty data via anomaly detection, applications could still be unreliable over the incomplete time series. Instead of simply discarding anomalies, we propose to (iteratively) repair them in time series data, by creatively bonding the beauty of temporal nature in anomaly detection with the widely considered minimum change principle in data repairing. Our major contributions include: (1) a novel framework of iterative minimum repairing (IMR) over time series data, (2) explicit analysis on convergence of the proposed iterative minimum repairing, and (3) efficient estimation of parameters in each iteration. Remarkably, with incremental computation, we reduce the complexity of parameter estimation from O(n) to O(1). Experiments on real datasets demonstrate the superiority of our proposal compared to the state-of-the-art approaches. In particular, we show that (the proposed) repairing indeed improves the time series classification application.