论文标题
咖啡 - 有效分析的柱状对象框架
Coffea -- Columnar Object Framework For Effective Analysis
论文作者
论文摘要
Coffea框架通过柱状操作提供了一种新的方法来用于高能物理分析,从而改善了分析的时间,可伸缩性,可移植性和可重复性。它是用Python编程语言,科学Python软件包生态系统和商品大数据技术实施的。为了在许多用例中实现这一套件,Coffea采用分解方法,将分析实施和数据传递方案分开。所有分析操作均使用numpy或尴尬的阵列软件包实现,这些软件包包裹以产生迅速直觉的用户代码。各种数据传递方案被包裹在接受用户输入和代码的通用前端,并返回用户定义的输出。我们将使用Coffea框架以及对用户体验和未来方向的讨论来讨论我们实施CMS数据分析的经验。
The coffea framework provides a new approach to High-Energy Physics analysis, via columnar operations, that improves time-to-insight, scalability, portability, and reproducibility of analysis. It is implemented with the Python programming language, the scientific python package ecosystem, and commodity big data technologies. To achieve this suite of improvements across many use cases, coffea takes a factorized approach, separating the analysis implementation and data delivery scheme. All analysis operations are implemented using the NumPy or awkward-array packages which are wrapped to yield user code whose purpose is quickly intuited. Various data delivery schemes are wrapped into a common front-end which accepts user inputs and code, and returns user defined outputs. We will discuss our experience in implementing analysis of CMS data using the coffea framework along with a discussion of the user experience and future directions.