论文标题

海洋:大数据科学计算的轻量化数据置换库

Sea: A lightweight data-placement library for Big Data scientific computing

论文作者

Hayot-Sasson, Valérie, Dugré, Mathieu, Glatard, Tristan

论文摘要

开放科学数据的最新涌入已导致科学计算从计算密集型到数据密集型的过渡。尽管存在许多大数据框架可以最大程度地减少数据传输成本,但很少有科学应用程序整合了这些框架或采用数据序列策略来降低成本。科学应用程序通常依赖于建立的命令行工具,这些工具需要完整的重新启示才能结合现有框架。我们开发了SEA,以启用用于在HPC群集上执行的科学应用而无需重新启动工作流程的科学应用程序的手段。 SEA利用GNU C库的拦截来拦截应用程序兼容的文件系统调用。我们设计了一个性能模型,并评估了在合成数据密集型应用程序处理的代表性神经成像数据集(大脑)上的海洋性能。我们的结果表明,海洋大大提高了性能,高达3 $ \ times $。

The recent influx of open scientific data has contributed to the transitioning of scientific computing from compute intensive to data intensive. Whereas many Big Data frameworks exist that minimize the cost of data transfers, few scientific applications integrate these frameworks or adopt data-placement strategies to mitigate the costs. Scientific applications commonly rely on well-established command-line tools that would require complete reinstrumentation in order to incorporate existing frameworks. We developed Sea as a means to enable data-placement strategies for scientific applications executing on HPC clusters without the need to reinstrument workflows. Sea leverages GNU C library interception to intercept POSIX-compliant file system calls made by the applications. We designed a performance model and evaluated the performance of Sea on a synthetic data-intensive application processing a representative neuroimaging dataset (the Big Brain). Our results demonstrate that Sea significantly improves performance, up to a factor of 3$\times$.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源