论文标题
Croft:高性能集群的可扩展的三维平行快速傅立叶变换(FFT)实现
CROFT: A scalable three-dimensional parallel Fast Fourier Transform (FFT) implementation for High Performance Clusters
论文作者
论文摘要
三维(3D)输入数据的FFT是数值模拟的重要计算内核,广泛用于在大量处理器上运行的高性能计算(HPC)代码。分子动态模拟等许多科学应用的性能取决于使用的基础3D平行FFT库。 在本文中,我们提出了C-DAC三维快速傅立叶变换(Croft)库,该库使用铅笔分解实现了三维并行FFT。为了利用处理器内核的超线程功能而不会影响性能,Croft旨在将多线程与MPI一起使用。 Croft实现具有使用多线程与MPI通信重叠计算和内存-I/O的创新功能。与其他3D FFT实现相比,Croft仅使用两个线程,其中一个线程专用于通信,以便可以有效地与计算重叠。因此,根据使用的过程数量,与FFTW3库相比,Croft的性能提高约为51%至42%。
The FFT of three-dimensional (3D) input data is an important computational kernel of numerical simulations and is widely used in High Performance Computing (HPC) codes running on a large number of processors. Performance of many scientific applications such as Molecular Dynamic simulations depends on the underlying 3D parallel FFT library being used. In this paper, we present C-DACs three-dimensional Fast Fourier Transform (CROFT) library which implements three-dimensional parallel FFT using pencil decomposition. To exploit the hyperthreading capabilities of processor cores without affecting performance, CROFT is designed to use multithreading along with MPI. CROFT implementation has an innovative feature of overlapping compute and memory-I/O with MPI communication using multithreading. As opposed to other 3D FFT implementations, CROFT uses only two threads where one thread is dedicated for communication so that it can be effectively overlapped with computations. Thus, depending on the number of processes used, CROFT achieves performance improvement of about 51% to 42% as compared to FFTW3 library.