论文标题
在混合编程环境中并行预处理共轭梯度的可重复性
Reproducibility of Parallel Preconditioned Conjugate Gradient in Hybrid Programming Environments
论文作者
论文摘要
预处理的共轭梯度方法通常用于在物理现象的数值模拟中产生的方程式的线性系统。虽然被广泛使用,但该求解器也因其在计算残差时缺乏准确性而闻名。在本文中,我们提出了两种来自EXBLAS项目的算法解决方案,以提高求解器的准确性,并确保其在混合MPI + OpenMP任务编程环境中的可重复性。一个基于Exblas并保留所有信息,直到最后一个舍入,而另一个则依靠浮点扩展,因此扩大了中间的精度。我们没有将整个求解器转换为与EXBLA相关的实现,而是确定那些违反可重复性/非缔合性的部分,将其确保并将其与顺序执行相结合。这些算法策略通过可编程性建议加强,以确保确定性执行。最后,我们在两个现代的HPC系统上验证了这些方法:这两个版本都提供了可再现数量的迭代,残差,直接误差和矢量溶液,可在768个核心上少于37.7%的开销。
The Preconditioned Conjugate Gradient method is often employed for the solution of linear systems of equations arising in numerical simulations of physical phenomena. While being widely used, the solver is also known for its lack of accuracy while computing the residual. In this article, we propose two algorithmic solutions that originate from the ExBLAS project to enhance the accuracy of the solver as well as to ensure its reproducibility in a hybrid MPI + OpenMP tasks programming environment. One is based on ExBLAS and preserves every bit of information until the final rounding, while the other relies upon floating-point expansions and, hence, expands the intermediate precision. Instead of converting the entire solver into its ExBLAS-related implementation, we identify those parts that violate reproducibility/non-associativity, secure them, and combine this with the sequential executions. These algorithmic strategies are reinforced with programmability suggestions to assure deterministic executions. Finally, we verify these approaches on two modern HPC systems: both versions deliver reproducible number of iterations, residuals, direct errors, and vector-solutions for the overhead of less than 37.7 % on 768 cores.