42 TFlops hierarchical N-body simulations on GPUs with applications in both astrophysics and turbulence

Tsuyoshi Hamada, Tetsu Narumi, Rio Yokota, Kenji Yasuoka, Keigo Nitadori, Makoto Taiji

Research output: Chapter in Book/Report/Conference proceedingConference contribution

81 Citations (Scopus)

Abstract

As an entry for the 2009 Gordon Bell price/performance prize, we present the results of two different hierarchical N-body simulations on a cluster of 256 graphics processing units (GPUs). Unlike many previous N-body simulations on GPUs that scale as O(N2), the present method calculates the O(N log N) treecode and O(N) fast multipole method (FMM) on the GPUs with unprecedented efficiency. We demonstrate the performance of our method by choosing one standard application - a gravitational N-body simulation - and one non-standard application - simulation of turbulence using vortex particles. The gravitational simulation using the treecode with 1,608,044,129 particles showed a sustained performance of 42.15 TFlops. The vortex particle simulation of homogeneous isotropic turbulence using the periodic FMM with 16,777,216 particles showed a sustained performance of 20.2 TFlops. The overall cost of the hardware was 228,912 dollars. The maximum corrected performance is 28.1TFlops for the gravitational simulation, which results in a cost performance of 124 MFlops/$. This correction is performed by counting the Flops based on the most efficient CPU algorithm. Any extra Flops that arise from the GPU implementation and parameter differences are not included in the 124 MFlops/$.

Original languageEnglish
Title of host publicationProceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09
DOIs
Publication statusPublished - 2009
EventConference on High Performance Computing Networking, Storage and Analysis, SC '09 - Portland, OR, United States
Duration: 2009 Nov 142009 Nov 20

Other

OtherConference on High Performance Computing Networking, Storage and Analysis, SC '09
CountryUnited States
CityPortland, OR
Period09/11/1409/11/20

Fingerprint

Astrophysics
Turbulence
Vortex flow
Program processors
Costs
Hardware
Graphics processing unit

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Computer Science Applications

Cite this

Hamada, T., Narumi, T., Yokota, R., Yasuoka, K., Nitadori, K., & Taiji, M. (2009). 42 TFlops hierarchical N-body simulations on GPUs with applications in both astrophysics and turbulence. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09 [1654123] https://doi.org/10.1145/1654059.1654123

42 TFlops hierarchical N-body simulations on GPUs with applications in both astrophysics and turbulence. / Hamada, Tsuyoshi; Narumi, Tetsu; Yokota, Rio; Yasuoka, Kenji; Nitadori, Keigo; Taiji, Makoto.

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09. 2009. 1654123.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Hamada, T, Narumi, T, Yokota, R, Yasuoka, K, Nitadori, K & Taiji, M 2009, 42 TFlops hierarchical N-body simulations on GPUs with applications in both astrophysics and turbulence. in Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09., 1654123, Conference on High Performance Computing Networking, Storage and Analysis, SC '09, Portland, OR, United States, 09/11/14. https://doi.org/10.1145/1654059.1654123
Hamada T, Narumi T, Yokota R, Yasuoka K, Nitadori K, Taiji M. 42 TFlops hierarchical N-body simulations on GPUs with applications in both astrophysics and turbulence. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09. 2009. 1654123 https://doi.org/10.1145/1654059.1654123
Hamada, Tsuyoshi ; Narumi, Tetsu ; Yokota, Rio ; Yasuoka, Kenji ; Nitadori, Keigo ; Taiji, Makoto. / 42 TFlops hierarchical N-body simulations on GPUs with applications in both astrophysics and turbulence. Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09. 2009.
@inproceedings{f96819c47aa14b99820c45715248519c,
title = "42 TFlops hierarchical N-body simulations on GPUs with applications in both astrophysics and turbulence",
abstract = "As an entry for the 2009 Gordon Bell price/performance prize, we present the results of two different hierarchical N-body simulations on a cluster of 256 graphics processing units (GPUs). Unlike many previous N-body simulations on GPUs that scale as O(N2), the present method calculates the O(N log N) treecode and O(N) fast multipole method (FMM) on the GPUs with unprecedented efficiency. We demonstrate the performance of our method by choosing one standard application - a gravitational N-body simulation - and one non-standard application - simulation of turbulence using vortex particles. The gravitational simulation using the treecode with 1,608,044,129 particles showed a sustained performance of 42.15 TFlops. The vortex particle simulation of homogeneous isotropic turbulence using the periodic FMM with 16,777,216 particles showed a sustained performance of 20.2 TFlops. The overall cost of the hardware was 228,912 dollars. The maximum corrected performance is 28.1TFlops for the gravitational simulation, which results in a cost performance of 124 MFlops/$. This correction is performed by counting the Flops based on the most efficient CPU algorithm. Any extra Flops that arise from the GPU implementation and parameter differences are not included in the 124 MFlops/$.",
author = "Tsuyoshi Hamada and Tetsu Narumi and Rio Yokota and Kenji Yasuoka and Keigo Nitadori and Makoto Taiji",
year = "2009",
doi = "10.1145/1654059.1654123",
language = "English",
isbn = "9781605587448",
booktitle = "Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09",

}

TY - GEN

T1 - 42 TFlops hierarchical N-body simulations on GPUs with applications in both astrophysics and turbulence

AU - Hamada, Tsuyoshi

AU - Narumi, Tetsu

AU - Yokota, Rio

AU - Yasuoka, Kenji

AU - Nitadori, Keigo

AU - Taiji, Makoto

PY - 2009

Y1 - 2009

N2 - As an entry for the 2009 Gordon Bell price/performance prize, we present the results of two different hierarchical N-body simulations on a cluster of 256 graphics processing units (GPUs). Unlike many previous N-body simulations on GPUs that scale as O(N2), the present method calculates the O(N log N) treecode and O(N) fast multipole method (FMM) on the GPUs with unprecedented efficiency. We demonstrate the performance of our method by choosing one standard application - a gravitational N-body simulation - and one non-standard application - simulation of turbulence using vortex particles. The gravitational simulation using the treecode with 1,608,044,129 particles showed a sustained performance of 42.15 TFlops. The vortex particle simulation of homogeneous isotropic turbulence using the periodic FMM with 16,777,216 particles showed a sustained performance of 20.2 TFlops. The overall cost of the hardware was 228,912 dollars. The maximum corrected performance is 28.1TFlops for the gravitational simulation, which results in a cost performance of 124 MFlops/$. This correction is performed by counting the Flops based on the most efficient CPU algorithm. Any extra Flops that arise from the GPU implementation and parameter differences are not included in the 124 MFlops/$.

AB - As an entry for the 2009 Gordon Bell price/performance prize, we present the results of two different hierarchical N-body simulations on a cluster of 256 graphics processing units (GPUs). Unlike many previous N-body simulations on GPUs that scale as O(N2), the present method calculates the O(N log N) treecode and O(N) fast multipole method (FMM) on the GPUs with unprecedented efficiency. We demonstrate the performance of our method by choosing one standard application - a gravitational N-body simulation - and one non-standard application - simulation of turbulence using vortex particles. The gravitational simulation using the treecode with 1,608,044,129 particles showed a sustained performance of 42.15 TFlops. The vortex particle simulation of homogeneous isotropic turbulence using the periodic FMM with 16,777,216 particles showed a sustained performance of 20.2 TFlops. The overall cost of the hardware was 228,912 dollars. The maximum corrected performance is 28.1TFlops for the gravitational simulation, which results in a cost performance of 124 MFlops/$. This correction is performed by counting the Flops based on the most efficient CPU algorithm. Any extra Flops that arise from the GPU implementation and parameter differences are not included in the 124 MFlops/$.

UR - http://www.scopus.com/inward/record.url?scp=74049152899&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=74049152899&partnerID=8YFLogxK

U2 - 10.1145/1654059.1654123

DO - 10.1145/1654059.1654123

M3 - Conference contribution

AN - SCOPUS:74049152899

SN - 9781605587448

BT - Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09

ER -