ML技术调查的比较探索并行性的查询程度

论文标题

ML技术调查的比较探索并行性的查询程度

A Comparative Exploration of ML Techniques for Tuning Query Degree of Parallelism

论文作者

Fan, Zhiwei, Sen, Rathijit, Koutris, Paraschos, Albarghouthi, Aws

论文摘要

在关系数据库管理系统（RDBMSS）中，有大量的工作应用机器学习（ML）技术来查询优化和查询性能预测。但是，这些作品通常忽略\ textit {intra-parallelism}的影响 - 用于提高OLAP查询性能的关键组件 - 对查询性能预测。在本文中，我们通过研究Microsoft SQL Server中的\ textit {调整平行性（DOP）的问题（调整平行性程度（DOP）的问题），迈出了第一步，这是一种流行的商业rdbms，允许单个查询可以使用多个核心执行单个查询。在我们的研究中，我们将DOP调整的问题作为{\ em回归}任务提出了问题，并检查了几种流行的ML模型如何在多核设置中有助于查询性能预测。我们探索设计空间并进行广泛的实验研究，将不同模型与性能指标列表进行了比较，测试了它们在不同的设置中的推广程度：$（i）$从同一模板中查询查询，$（ii）$与新模板中的查询，$（iii）$，$（iii）$与不同规模的实例，以及$（iv）$（IV）$（IV）$ to Instances $和不同的实例和Queries和Queries。我们的实验结果表明，忽略成本模型估计的输入查询计划的简单特征可以准确预测查询性能，捕获相对于可用的并行性的加速趋势，并有助于自动选择最佳的每次传输DOP。

There is a large body of recent work applying machine learning (ML) techniques to query optimization and query performance prediction in relational database management systems (RDBMSs). However, these works typically ignore the effect of \textit{intra-parallelism} -- a key component used to boost the performance of OLAP queries in practice -- on query performance prediction. In this paper, we take a first step towards filling this gap by studying the problem of \textit{tuning the degree of parallelism (DOP) via ML techniques} in Microsoft SQL Server, a popular commercial RDBMS that allows an individual query to execute using multiple cores. In our study, we cast the problem of DOP tuning as a {\em regression} task, and examine how several popular ML models can help with query performance prediction in a multi-core setting. We explore the design space and perform an extensive experimental study comparing different models against a list of performance metrics, testing how well they generalize in different settings: $(i)$ to queries from the same template, $(ii)$ to queries from a new template, $(iii)$ to instances of different scale, and $(iv)$ to different instances and queries. Our experimental results show that a simple featurization of the input query plan that ignores cost model estimations can accurately predict query performance, capture the speedup trend with respect to the available parallelism, as well as help with automatically choosing an optimal per-query DOP.

下载PDF全文

下载文献需遵守相关版权规定

论文标题