论文标题
因素窗口:基于成本的查询重写用于优化相关的窗口聚合物
Factor Windows: Cost-based Query Rewriting for Optimizing Correlated Window Aggregates
论文作者
论文摘要
窗口骨料在流处理中无处不在。在Azure Stream Analytics(ASA)中,由Microsoft的Azure Cloud托管的流处理服务,我们看到许多客户查询在多个相关的窗口上包含聚合功能(例如Min和Max)(例如,在同一事件流中定义的多个相关窗口(例如,长度为五分钟的翻滚窗口)。在本文中,我们提出了一个基于成本的优化框架,用于通过在多个窗口中共享计算来优化此类查询。特别是,我们介绍了因子窗口的概念,这些因素窗口是不在输入查询中的辅助窗口,但仍可能有助于降低整体计算成本,我们的基于成本的优化器可以生成比使用因子窗口窗口的原始查询计划更低的成本的重写查询计划。由于我们的优化技术处于查询级别(计划)的重写级别,因此可以在支持声明性的类似SQL的查询语言的任何流处理系统上实现,而无需更改基础查询执行引擎。我们正式化了共享的计算问题,详细介绍了优化技术,并报告了对合成数据集和真实数据集的评估结果。我们的结果表明,与原始查询计划相比,基于成本的优化器的重写计划可以显着更高(高达16.8倍)吞吐量。
Window aggregates are ubiquitous in stream processing. In Azure Stream Analytics (ASA), a stream processing service hosted by Microsoft's Azure cloud, we see many customer queries that contain aggregate functions (such as MIN and MAX) over multiple correlated windows (e.g., tumbling windows of length five minutes and ten minutes) defined on the same event stream. In this paper, we present a cost-based optimization framework for optimizing such queries by sharing computation among multiple windows. In particular, we introduce the notion of factor windows, which are auxiliary windows that are not in the input query but may nevertheless help reduce the overall computation cost, and our cost-based optimizer can produce rewritten query plans that have lower costs than the original query plan by utilizing factor windows. Since our optimization techniques are at the level of query (plan) rewriting, they can be implemented on any stream processing system that supports a declarative, SQL-like query language without changing the underlying query execution engine. We formalize the shared computation problem, present the optimization techniques in detail, and report evaluation results over both synthetic and real datasets. Our results show that, compared to the original query plans, the rewritten plans output by our cost-based optimizer can yield significantly higher (up to 16.8x) throughput.