TY - JOUR
T1 - Petascale turbulence simulation using a highly parallel fast multipole method on GPUs
AU - Yokota, Rio
AU - Barba, L. A.
AU - Narumi, Tetsu
AU - Yasuoka, Kenji
N1 - Funding Information:
Computing time in the tsubame -2.0 system was made possible by the Grand Challenge Program of tsubame -2.0. The current work was partially supported by the Core Research for the Evolution Science and Technology (CREST) of the Japan Science and Technology Corporation (JST) . LAB acknowledges funding from NSF grant OCI-0946441 , ONR grant #N00014-11-1-0356 and NSF CAREER award OCI-1149784 . LAB is also grateful for the support from N vidia Corp. via an Academic Partnership award (Aug. 2011).
PY - 2013/3
Y1 - 2013/3
N2 - This paper reports large-scale direct numerical simulations of homogeneous-isotropic fluid turbulence, achieving sustained performance of 1.08 petaflop/s on gpu hardware using single precision. The simulations use a vortex particle method to solve the Navier-Stokes equations, with a highly parallel fast multipole method (fmm) as numerical engine, and match the current record in mesh size for this application, a cube of 40963 computational points solved with a spectral method. The standard numerical approach used in this field is the pseudo-spectral method, relying on the fft algorithm as the numerical engine. The particle-based simulations presented in this paper quantitatively match the kinetic energy spectrum obtained with a pseudo-spectral method, using a trusted code. In terms of parallel performance, weak scaling results show the fmm-based vortex method achieving 74% parallel efficiency on 4096 processes (one gpu per mpi process, 3 gpus per node of the tsubame-2.0 system). The fft-based spectral method is able to achieve just 14% parallel efficiency on the same number of mpi processes (using only cpu cores), due to the all-to-All communication pattern of the fft algorithm. The calculation time for one time step was 108 s for the vortex method and 154 s for the spectral method, under these conditions. Computing with 69 billion particles, this work exceeds by an order of magnitude the largest vortex-method calculations to date.
AB - This paper reports large-scale direct numerical simulations of homogeneous-isotropic fluid turbulence, achieving sustained performance of 1.08 petaflop/s on gpu hardware using single precision. The simulations use a vortex particle method to solve the Navier-Stokes equations, with a highly parallel fast multipole method (fmm) as numerical engine, and match the current record in mesh size for this application, a cube of 40963 computational points solved with a spectral method. The standard numerical approach used in this field is the pseudo-spectral method, relying on the fft algorithm as the numerical engine. The particle-based simulations presented in this paper quantitatively match the kinetic energy spectrum obtained with a pseudo-spectral method, using a trusted code. In terms of parallel performance, weak scaling results show the fmm-based vortex method achieving 74% parallel efficiency on 4096 processes (one gpu per mpi process, 3 gpus per node of the tsubame-2.0 system). The fft-based spectral method is able to achieve just 14% parallel efficiency on the same number of mpi processes (using only cpu cores), due to the all-to-All communication pattern of the fft algorithm. The calculation time for one time step was 108 s for the vortex method and 154 s for the spectral method, under these conditions. Computing with 69 billion particles, this work exceeds by an order of magnitude the largest vortex-method calculations to date.
KW - Fast multipole method
KW - Integral equations
KW - Isotropic turbulence
KW - gpu
UR - http://www.scopus.com/inward/record.url?scp=84872043994&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84872043994&partnerID=8YFLogxK
U2 - 10.1016/j.cpc.2012.09.011
DO - 10.1016/j.cpc.2012.09.011
M3 - Article
AN - SCOPUS:84872043994
SN - 0010-4655
VL - 184
SP - 445
EP - 455
JO - Computer Physics Communications
JF - Computer Physics Communications
IS - 3
ER -