论文标题
定期和不规则时间间隔清洁时间序列数据
Time Series Data Cleaning with Regular and Irregular Time Intervals
论文作者
论文摘要
错误在时间序列数据中普遍存在,尤其是在工业领域。具有错误的数据无法存储在数据库中,从而导致数据资产丢失。当给定时间间隔时,处理时间序列中的肮脏数据是非平凡的。目前,要处理包含错误的时间序列,除了保留原始错误数据,丢弃错误数据并手动检查错误数据外,我们还可以使用数据库中广泛使用的清洁算法来自动清洁时间序列数据。该调查提供了时间序列数据清洁技术的分类,并全面审查了每种类型的最新方法。特别是,我们特别关注不规则的时间间隔。此外,我们总结了研究和行业的数据清洁工具,系统和评估标准。最后,我们突出显示可能的方向时间序列数据清洁。
Errors are prevalent in time series data, especially in the industrial field. Data with errors could not be stored in the database, which results in the loss of data assets. Handling the dirty data in time series is non-trivial, when given irregular time intervals. At present, to deal with these time series containing errors, besides keeping original erroneous data, discarding erroneous data and manually checking erroneous data, we can also use the cleaning algorithm widely used in the database to automatically clean the time series data. This survey provides a classification of time series data cleaning techniques and comprehensively reviews the state-of-the-art methods of each type. In particular, we have a special focus on the irregular time intervals. Besides we summarize data cleaning tools, systems and evaluation criteria from research and industry. Finally, we highlight possible directions time series data cleaning.