funcx：联邦功能作为科学服务

论文标题

funcx：联邦功能作为科学服务

funcX: Federated Function as a Service for Science

论文作者

Li, Zhuozhao, Chard, Ryan, Babuji, Yadu, Galewsky, Ben, Skluzacek, Tyler, Nagaitsev, Kirill, Woodard, Anna, Blaiszik, Ben, Bryan, Josh, Katz, Daniel S., Foster, Ian, Chard, Kyle

论文摘要

FunCX是一个分布式函数，作为服务（FAAS）平台，可实现灵活，可扩展和高性能远程函数执行。与集中的FAAS系统不同，FunCX将云托管的管理功能从边缘托管执行功能中脱离。用户或管理员可以在任意笔记本电脑，云，簇和超级计算机上部署FunCX的端点软件，实际上可以将其转变为功能服务系统。 Funcx的云托管服务提供了一个用于注册，共享和管理功能和端点的位置。它允许在终点的联合生态系统上执行透明，安全和可靠的功能，从而使用户根据特定需求将功能路由到端点。 Funcx使用容器（例如Docker，Singularity和Shifter）来提供跨端点的常见执行环境。 Funcx实现了各种容器管理策略，以在不同的funcx端点上以高性能和效率执行功能。 Funcx还与内存数据存储和Globus集成了可能跨越端点的数据。我们激发了对Funcx的需求，提出了我们的原型设计和实现，并通过对两个超级计算机进行实验证明了Funcx可以扩展到超过130,000名并发工人。我们表明，与随机算法相比，Funcx的容器变暖算法可以将3000个功能的完成时间减少61％，并且内存数据存储可以使数据传输加速至最高3倍，而不是共享文件系统。

funcX is a distributed function as a service (FaaS) platform that enables flexible, scalable, and high performance remote function execution. Unlike centralized FaaS systems, funcX decouples the cloud-hosted management functionality from the edge-hosted execution functionality. funcX's endpoint software can be deployed, by users or administrators, on arbitrary laptops, clouds, clusters, and supercomputers, in effect turning them into function serving systems. funcX's cloud-hosted service provides a single location for registering, sharing, and managing both functions and endpoints. It allows for transparent, secure, and reliable function execution across the federated ecosystem of endpoints--enabling users to route functions to endpoints based on specific needs. funcX uses containers (e.g., Docker, Singularity, and Shifter) to provide common execution environments across endpoints. funcX implements various container management strategies to execute functions with high performance and efficiency on diverse funcX endpoints. funcX also integrates with an in-memory data store and Globus for managing data that may span endpoints. We motivate the need for funcX, present our prototype design and implementation, and demonstrate, via experiments on two supercomputers, that funcX can scale to more than 130 000 concurrent workers. We show that funcX's container warming-aware routing algorithm can reduce the completion time for 3000 functions by up to 61% compared to a randomized algorithm and the in-memory data store can speed up data transfers by up to 3x compared to a shared file system.

下载PDF全文

下载文献需遵守相关版权规定

论文标题