多个GPU加速高阶气体运动方案，用于可压缩湍流的直接数值模拟

论文标题

多个GPU加速高阶气体运动方案，用于可压缩湍流的直接数值模拟

Multiple-GPU accelerated high-order gas-kinetic scheme for direct numerical simulation of compressible turbulence

论文作者

Wang, Yuhang, Cao, Guiyu, Pan, Liang

论文摘要

高阶气体运动方案（HGK）已成为湍流直接数值模拟（DNS）的可行工具。在本文中，为了加速计算，使用计算统一设备体系结构（CUDA）使用图形处理单元（GPU）实现HGK。为了进行大规模的湍流DNS，也使用消息传递接口（MPI）和CUDA架构进行多个GPU进一步升级HGK。提出了可压缩湍流的基准情况，包括泰勒绿色涡流和湍流通道流，以评估HGK具有NVIDIA TITAN RTX和TESLA V100 GPU的数值性能。对于单GPU计算，与在Intel Core i7-9700上运行的并行中央处理单元（CPU）代码相比，带有开放的多处理（OpenMP）指令，Titan RTX实现了7倍加速，而TESLA V100实现了16倍的加速。对于多个GPU计算，使用MPI使用的1024 Intel Xeon E5-2692核在1024 Intel Xeon E5-2692内运行的计算时间大约是使用MPI和CUDA的8 Tesla V100 GPU的GPU代码的3倍。数值结果证实了多个GPU加速HGK在大规模湍流中的出色性能。 GPU中的HGK还具有FP32精度，以评估数字格式精度的效果。合理地，与FP64精度的计算相比，提高了效率，并使用FP32精度降低了存储成本。对于湍流流动，长期统计湍流量的差异在FP32和FP64精度解决方案之间可以接受。虽然可以观察到瞬时湍流数量的明显差异，但这表明FP32精度对于可压缩湍流中的DNS并不安全。精度的选择应取决于准确性和可用计算资源的要求。

High-order gas-kinetic scheme (HGKS) has become a workable tool for the direct numerical simulation (DNS) of turbulence. In this paper, to accelerate the computation, HGKS is implemented with the graphical processing unit (GPU) using the compute unified device architecture (CUDA). To conduct the much large-scale DNS of turbulence, HGKS also be further upgraded with multiple GPUs using message passing interface (MPI) and CUDA architecture. The benchmark cases for compressible turbulence, including Taylor-Green vortex and turbulent channel flows, are presented to assess the numerical performance of HGKS with Nvidia TITAN RTX and Tesla V100 GPUs. For single-GPU computation, compared with the parallel central processing unit (CPU) code running on the Intel Core i7-9700 with open multi-processing (OpenMP) directives, 7x speedup is achieved by TITAN RTX and 16x speedup is achieved by Tesla V100. For multiple-GPU computation, the computational time of parallel CPU code running on 1024 Intel Xeon E5-2692 cores with MPI is approximately 3 times longer than that of GPU code using 8 Tesla V100 GPUs with MPI and CUDA. Numerical results confirm the excellent performance of multiple-GPU accelerated HGKS for large-scale DNS of turbulence. HGKS in GPU is also compiled with FP32 precision to evaluate the effect of number formats precision. Reasonably, compared to the computation with FP64 precision, the efficiency is improved and the memory cost is reduced with FP32 precision. For turbulent channel flows, difference in long-time statistical turbulent quantities is acceptable between FP32 and FP64 precision solutions. While the obvious discrepancy in instantaneous turbulent quantities can be observed, which shows that FP32 precision is not safe for DNS in compressible turbulence. The choice of precision should depended on the requirement of accuracy and the available computational resources.

下载PDF全文

下载文献需遵守相关版权规定

论文标题