论文标题
可视化跨双变量循环时间粒度的概率分布
Visualizing probability distributions across bivariate cyclic temporal granularities
论文作者
论文摘要
将时间索引分解为时间粒度可以帮助探索和自动分析大型时间数据集。本文介绍了使用线性和循环时间粒度的时间解构类别的类别。线性粒度尊重时间的线性进程,例如小时,天,几周和几个月。循环粒度可以是循环的,例如当天的准圆形,例如每天的一天,以及诸如公共假期之类的上市。粒度的层次结构产生了嵌套的顺序:每小时和第二个时刻是单订单。每周的小时是多订购的,因为它每周都会过去。为创建时间索引创建所有可能的粒度的方法提供了方法。建议算法提供了一个指示,是否可以将一对粒度进行有意义的检查(“和谐”),或者当它们无法(“冲突”)时。 时间粒度可用于创建数据可视化,以探索周期,关联和异常。粒度形成了分类变量(有序或无序),从而诱导观察结果的分组。假设数字响应变量,则将所得的图形显示在分类变量组合之间的分布的显示。 开源r软件包“ gravitas”中实现的方法与整洁的工作流程一致,并使用`ggplot2`中可用的图形范围''进行了概率分布。
Deconstructing a time index into time granularities can assist in exploration and automated analysis of large temporal data sets. This paper describes classes of time deconstructions using linear and cyclic time granularities. Linear granularities respect the linear progression of time such as hours, days, weeks and months. Cyclic granularities can be circular such as hour-of-the-day, quasi-circular such as day-of-the-month, and aperiodic such as public holidays. The hierarchical structure of granularities creates a nested ordering: hour-of-the-day and second-of-the-minute are single-order-up. Hour-of-the-week is multiple-order-up, because it passes over day-of-the-week. Methods are provided for creating all possible granularities for a time index. A recommendation algorithm provides an indication whether a pair of granularities can be meaningfully examined together (a "harmony"), or when they cannot (a "clash"). Time granularities can be used to create data visualizations to explore for periodicities, associations and anomalies. The granularities form categorical variables (ordered or unordered) which induce groupings of the observations. Assuming a numeric response variable, the resulting graphics are then displays of distributions compared across combinations of categorical variables. The methods implemented in the open source R package `gravitas` are consistent with a tidy workflow, with probability distributions examined using the range of graphics available in `ggplot2`.