论文标题

部分可观测时空混沌系统的无模型预测

NoSQL Database Tuning through Machine Learning

论文作者

Eppinger, Florian, Störl, Uta

论文摘要

NOSQL数据库已成为许多大数据和实时Web应用程序的重要组成部分。它们的分布式性质和可扩展性使它们成为各种用例的理想数据存储存储库。尽管NOSQL数据库是通过默认的“现成”配置提供的,但它们提供配置设置以将数据库的行为和性能调整为特定的用例和环境。配置设置的丰度和经常不可察觉的相互依存关系使得很难优化和性能调整NOSQL系统。没有一定大小的配置,因此在优化NOSQL数据库的配置时需要考虑工作负载,物理设计和可用资源。这项工作探讨了机器学习,作为自动调整NOSQL数据库的一种手段,以获得最佳性能。使用随机森林和梯度提升决策树机学习算法,将多个机器学习模型配备了一个培训数据集,该数据集包含了NOSQL物理配置(复制和分片)的属性。然后,最佳模型被用作替代模型,以使用黑盒优化算法优化数据库管理系统的配置设置,以用于吞吐量和延迟。使用Apache Cassandra数据库,即使在不同的物理配置中,也进行了多个实验以证明这种方法的可行性。调谐的DBMS配置可产生高达4%的吞吐量改进,读取高达43%的延迟减少,并与默认配置设置相比,延迟降低高达39%。

NoSQL databases have become an important component of many big data and real-time web applications. Their distributed nature and scalability make them an ideal data storage repository for a variety of use cases. While NoSQL databases are delivered with a default ''off-the-shelf'' configuration, they offer configuration settings to adjust a database's behavior and performance to a specific use case and environment. The abundance and oftentimes imperceptible inter-dependencies of configuration settings make it difficult to optimize and performance-tune a NoSQL system. There is no one-size-fits-all configuration and therefore the workload, the physical design, and available resources need to be taken into account when optimizing the configuration of a NoSQL database. This work explores Machine Learning as a means to automatically tune a NoSQL database for optimal performance. Using Random Forest and Gradient Boosting Decision Tree Machine Learning algorithms, multiple Machine Learning models were fitted with a training dataset that incorporates properties of the NoSQL physical configuration (replication and sharding). The best models were then employed as surrogate models to optimize the Database Management System's configuration settings for throughput and latency using a Black-box Optimization algorithm. Using an Apache Cassandra database, multiple experiments were carried out to demonstrate the feasibility of this approach, even across varying physical configurations. The tuned DBMS configurations yielded throughput improvements of up to 4%, read latency reductions of up to 43%, and write latency reductions of up to 39% when compared to the default configuration settings.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源