Task level pipelining on multiple accelerators via FPGA switch

Takaaki Miyajima, Takuya Kuhara, Toshihiro Hanawa, Hideharu Amano, Taisuke Boku

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We show a task level pipelining on multiple accelerators with PEACH2. PEACH2, which is implemented on FPGA, enables ultra low latency direct communication among multiple accelerators over computational nodes. By installing PEACH2, typical high performance computation nodes are tightly coupled. In this environment, application can be accelerated by exploiting not only data level parallelism, but also task level parallelism. Furthermore, we can process multiple task on multiple accelerators in a pipelined manner. In our evaluation, pipelined application which is implemented in a task level pipelined manner achieves 52% speed up compared to a single GPU.

Original languageEnglish
Title of host publicationProceedings of the IASTED International Conference on Parallel and Distributed Computing and Networks, PDCN 2014
PublisherActa Press
Pages267-274
Number of pages8
DOIs
Publication statusPublished - 2014
Event12th IASTED International Conference on Parallel and Distributed Computing and Networks, PDCN 2014 - Innsbruck, Austria
Duration: 2014 Feb 172014 Feb 19

Other

Other12th IASTED International Conference on Parallel and Distributed Computing and Networks, PDCN 2014
CountryAustria
CityInnsbruck
Period14/2/1714/2/19

Fingerprint

Particle accelerators
Field programmable gate arrays (FPGA)
Switches
Communication
Graphics processing unit

Keywords

  • Accelerator computing
  • FPGA Interconnect
  • GPU cluster
  • Interconnect for accelerators
  • Task Level Pipeline

ASJC Scopus subject areas

  • Software

Cite this

Miyajima, T., Kuhara, T., Hanawa, T., Amano, H., & Boku, T. (2014). Task level pipelining on multiple accelerators via FPGA switch. In Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Networks, PDCN 2014 (pp. 267-274). Acta Press. https://doi.org/10.2316/P.2014.811-026

Task level pipelining on multiple accelerators via FPGA switch. / Miyajima, Takaaki; Kuhara, Takuya; Hanawa, Toshihiro; Amano, Hideharu; Boku, Taisuke.

Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Networks, PDCN 2014. Acta Press, 2014. p. 267-274.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Miyajima, T, Kuhara, T, Hanawa, T, Amano, H & Boku, T 2014, Task level pipelining on multiple accelerators via FPGA switch. in Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Networks, PDCN 2014. Acta Press, pp. 267-274, 12th IASTED International Conference on Parallel and Distributed Computing and Networks, PDCN 2014, Innsbruck, Austria, 14/2/17. https://doi.org/10.2316/P.2014.811-026
Miyajima T, Kuhara T, Hanawa T, Amano H, Boku T. Task level pipelining on multiple accelerators via FPGA switch. In Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Networks, PDCN 2014. Acta Press. 2014. p. 267-274 https://doi.org/10.2316/P.2014.811-026
Miyajima, Takaaki ; Kuhara, Takuya ; Hanawa, Toshihiro ; Amano, Hideharu ; Boku, Taisuke. / Task level pipelining on multiple accelerators via FPGA switch. Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Networks, PDCN 2014. Acta Press, 2014. pp. 267-274
@inproceedings{55b41d45988a40debd40ce7550e209a7,
title = "Task level pipelining on multiple accelerators via FPGA switch",
abstract = "We show a task level pipelining on multiple accelerators with PEACH2. PEACH2, which is implemented on FPGA, enables ultra low latency direct communication among multiple accelerators over computational nodes. By installing PEACH2, typical high performance computation nodes are tightly coupled. In this environment, application can be accelerated by exploiting not only data level parallelism, but also task level parallelism. Furthermore, we can process multiple task on multiple accelerators in a pipelined manner. In our evaluation, pipelined application which is implemented in a task level pipelined manner achieves 52{\%} speed up compared to a single GPU.",
keywords = "Accelerator computing, FPGA Interconnect, GPU cluster, Interconnect for accelerators, Task Level Pipeline",
author = "Takaaki Miyajima and Takuya Kuhara and Toshihiro Hanawa and Hideharu Amano and Taisuke Boku",
year = "2014",
doi = "10.2316/P.2014.811-026",
language = "English",
pages = "267--274",
booktitle = "Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Networks, PDCN 2014",
publisher = "Acta Press",

}

TY - GEN

T1 - Task level pipelining on multiple accelerators via FPGA switch

AU - Miyajima, Takaaki

AU - Kuhara, Takuya

AU - Hanawa, Toshihiro

AU - Amano, Hideharu

AU - Boku, Taisuke

PY - 2014

Y1 - 2014

N2 - We show a task level pipelining on multiple accelerators with PEACH2. PEACH2, which is implemented on FPGA, enables ultra low latency direct communication among multiple accelerators over computational nodes. By installing PEACH2, typical high performance computation nodes are tightly coupled. In this environment, application can be accelerated by exploiting not only data level parallelism, but also task level parallelism. Furthermore, we can process multiple task on multiple accelerators in a pipelined manner. In our evaluation, pipelined application which is implemented in a task level pipelined manner achieves 52% speed up compared to a single GPU.

AB - We show a task level pipelining on multiple accelerators with PEACH2. PEACH2, which is implemented on FPGA, enables ultra low latency direct communication among multiple accelerators over computational nodes. By installing PEACH2, typical high performance computation nodes are tightly coupled. In this environment, application can be accelerated by exploiting not only data level parallelism, but also task level parallelism. Furthermore, we can process multiple task on multiple accelerators in a pipelined manner. In our evaluation, pipelined application which is implemented in a task level pipelined manner achieves 52% speed up compared to a single GPU.

KW - Accelerator computing

KW - FPGA Interconnect

KW - GPU cluster

KW - Interconnect for accelerators

KW - Task Level Pipeline

UR - http://www.scopus.com/inward/record.url?scp=84898416063&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84898416063&partnerID=8YFLogxK

U2 - 10.2316/P.2014.811-026

DO - 10.2316/P.2014.811-026

M3 - Conference contribution

SP - 267

EP - 274

BT - Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Networks, PDCN 2014

PB - Acta Press

ER -