通过析取向列引擎的析取优化查询谓词

论文标题

通过析取向列引擎的析取优化查询谓词

Optimizing Query Predicates with Disjunctions for Column-Oriented Engines

论文作者

Kim, Albert, Ileri, Atalay Mert, Madden, Sam

论文摘要

数据库研究始终给予有限的关注，以通过析取优化谓词。过去的工作很少，主要集中在针对传统的面向行导向数据库的优化上。但是，面向行导向和面向列的引擎如何评估谓词之间的关键区别在于，虽然面向行的引擎一次将谓词应用于单个元组，但面向柱的引擎将谓词应用于一组单元，并将另一个维度添加到问题中。因此，面向行的引擎仅关注将谓词应用于“短路”的最佳顺序，但面向列的引擎必须额外决定每个谓词应用程序的输入组。这很重要，因为较小的输入会导致更快的运行时间，而且非平地，因为可以使用早期谓词的结果将输入减少到较晚的谓词和谓词可以通过谓词表达式中的分离组合。在这项工作中，我们正式分析了面向柱的引擎的谓词评估问题，并呈现BESTD/Update，这是有史以来第一个多项式时间，可证明是最佳的算法，以推导每个谓词应用程序的最小输入集。 BESTD/UPDATE的最佳性能在各种成本模型下保证，代表不同的现实情况。此外，当与谓词订购算法Hanani结合使用时，BESTD/更新将减少为Adepred，这是一种简单的O（n log^2 N）算法，我们建议使用该算法，用于实际使用和最佳的嵌套深度2或更低谓词表达式。我们的评估表明，由于其最佳性和多项式计划时间，评估的表现优于没有实施任何分离优化的优化，并分别将最高2.6倍和28X用于合成工作负载和最多1.3倍和100倍的最高可查询和100倍的查询。

Database research has always given limited attention to optimizing predicates with disjunctions. What little past work there is, has mostly focused on optimizations for traditional row-oriented databases. However, a key difference between how row-oriented and column-oriented engines evaluate predicates is that while row-oriented engines apply predicates to a single tuple at a time, column-oriented engines apply predicates to sets of tuples, adding another dimension to the problem. As such, row-oriented engines focus only on the best order to apply predicates in to "short-circuit" the overall predicate expression, but column-oriented engines must additionally decide on the input sets of tuples for each predicate application. This is important, since smaller inputs lead to faster runtimes, and nontrivial, since the results of earlier predicates can be used to reduce the inputs to later predicates and predicates may be combined via disjunctions in the predicate expression. In this work, we formally analyze the predicate evaluation problem for column-oriented engines and present BestD/Update, the first ever polynomial-time, provably optimal algorithms to deduce the minimum input sets for each predicate application. BestD/Update's optimality is guaranteed under a wide range of cost models, representing different real-world scenarios. Furthermore, when combined with the predicate ordering algorithm Hanani, BestD/Update reduce into EvalPred, a simple O(n log^2 n) algorithm, which we recommend for practical use and optimal for all predicate expressions of nested depth 2 or less. Our evaluation shows, thanks to its optimality and polynomial planning time, EvalPred outperforms not implementing any disjunction optimizations and exiting optimal algorithms by up to 2.6x and 28x respectively for synthetic workloads and by up to 1.3x and 100x respectively for queries from TPC-H and the CH-benchmark.

下载PDF全文

下载文献需遵守相关版权规定

论文标题