KAIROS：具有异质云资源的建筑成本效益机器学习推理系统

论文标题

KAIROS：具有异质云资源的建筑成本效益机器学习推理系统

KAIROS: Building Cost-Efficient Machine Learning Inference Systems with Heterogeneous Cloud Resources

论文作者

Li, Baolin, Samsi, Siddharth, Gadepally, Vijay, Tiwari, Devesh

论文摘要

在线推断已成为许多企业的关键服务产品，该产品已部署在云平台中以满足客户需求。尽管具有收入的能力，但这些服务仍需要在严格的服务质量（QoS）和成本预算限制下运作。本文介绍了Kairos，这是一个新颖的运行时框架，可在满足QoS目标和成本预算的同时最大化查询吞吐量。 Kairos设计和实施新技术，以在没有在线探索开销的情况下构建异质计算硬件池，并在运行时最佳地分发推理查询。我们使用行业级深度学习（DL）模型的评估表明，Kairos的最佳均匀解决方案的吞吐量高达2倍，尽管具有优势的相互竞争计划以无视他们的勘探在头顶上的优势实现，但最高的最先进方案的表现最高为70％。

Online inference is becoming a key service product for many businesses, deployed in cloud platforms to meet customer demands. Despite their revenue-generation capability, these services need to operate under tight Quality-of-Service (QoS) and cost budget constraints. This paper introduces KAIROS, a novel runtime framework that maximizes the query throughput while meeting QoS target and a cost budget. KAIROS designs and implements novel techniques to build a pool of heterogeneous compute hardware without online exploration overhead, and distribute inference queries optimally at runtime. Our evaluation using industry-grade deep learning (DL) models shows that KAIROS yields up to 2X the throughput of an optimal homogeneous solution, and outperforms state-of-the-art schemes by up to 70%, despite advantageous implementations of the competing schemes to ignore their exploration overhead.

下载PDF全文

下载文献需遵守相关版权规定

论文标题