论文标题

嵌套数据的可扩展查询

Scalable Querying of Nested Data

论文作者

Smith, Jaclyn, Benedikt, Michael, Nikolic, Milos, Shaikhha, Amir

论文摘要

尽管大规模的分布式数据处理平台已成为查询处理的有吸引力的目标,但这些系统对于处理嵌套集合的应用程序是有问题的。程序员被迫进行集合程序的非平凡翻译或采用自动化的扁平化程序,这两者都会导致性能问题。这些挑战只会因偏斜的红衣群的嵌套收藏而恶化,在这些收藏中,手工制作的重写和自动化平坦都无法在分区之间执行负载平衡。 在这项工作中,我们提出了一个框架,该框架将操纵嵌套集合的程序转换为一组可以有效评估的语义等效的碎片查询。该框架采用了查询汇编技术的组合,嵌套集合的有效数据表示以及自动化的偏斜处理。我们提供了广泛的实验评估,表明在嵌套收集计划的各种情况下,该框架提供了重大改进。

While large-scale distributed data processing platforms have become an attractive target for query processing, these systems are problematic for applications that deal with nested collections. Programmers are forced either to perform non-trivial translations of collection programs or to employ automated flattening procedures, both of which lead to performance problems. These challenges only worsen for nested collections with skewed cardinalities, where both handcrafted rewriting and automated flattening are unable to enforce load balancing across partitions. In this work, we propose a framework that translates a program manipulating nested collections into a set of semantically equivalent shredded queries that can be efficiently evaluated. The framework employs a combination of query compilation techniques, an efficient data representation for nested collections, and automated skew-handling. We provide an extensive experimental evaluation, demonstrating significant improvements provided by the framework in diverse scenarios for nested collection programs.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源