论文标题
优化在低延迟无服务器数据流上服务的预测
Optimizing Prediction Serving on Low-Latency Serverless Dataflow
论文作者
论文摘要
预测服务系统旨在提供大量的低延迟推断机器学习模型。这些系统混合数据处理和计算密集型模型推断,并受益于多个异质处理器和分布式计算资源。在本文中,我们认为,熟悉的数据流API非常适合这项潜伏敏感的任务,即使使用未修改的黑盒ML模型也可以进行优化。我们介绍了CloudFlow的设计,该系统可提供此API并在自动升级无服务器后端实现。 CloudFlow透明地实现了关键性能优化,包括操作员融合和竞争性执行。我们的评估表明,CloudFlow的优化对合成工作负载产生了重大的性能改善,并且CloudFlow在现实世界预测管道上的最先进的预测服务系统的表现高达2倍,实现了实时视频分析(例如实时视频分析)的延迟目标。
Prediction serving systems are designed to provide large volumes of low-latency inferences machine learning models. These systems mix data processing and computationally intensive model inference and benefit from multiple heterogeneous processors and distributed computing resources. In this paper, we argue that a familiar dataflow API is well-suited to this latency-sensitive task, and amenable to optimization even with unmodified black-box ML models. We present the design of Cloudflow, a system that provides this API and realizes it on an autoscaling serverless backend. Cloudflow transparently implements performance-critical optimizations including operator fusion and competitive execution. Our evaluation shows that Cloudflow's optimizations yield significant performance improvements on synthetic workloads and that Cloudflow outperforms state-of-the-art prediction serving systems by as much as 2x on real-world prediction pipelines, meeting latency goals of demanding applications like real-time video analysis.