Accelerating Deep Learning using Multiple GPUs and FPGA-Based 10GbE Switch

Tomoya Itsubo, Michihiro Koibuchi, Hideharu Amano, Hiroki Matsutani

研究成果: Conference contribution

抜粋

A back-propagation algorithm following a gradient descent approach is used for training deep neural networks. Since it iteratively performs a large number of matrix operations to compute the gradients, GPUs (Graphics Processing Units) are efficient especially for the training phase. Thus, a cluster of computers each of which equips multiple GPUs can significantly accelerate the training phase. Although the gradient computation is still a major bottleneck of the training, gradient aggregation and parameter optimization impose both communication and computation overheads, which should also be reduced for further shortening the training time. To address this issue, in this paper, multiple GPUs are interconnected with a PCI Express (PCIe) over 10 Gbit Ethernet (10GbE) technology. Since these remote GPUs are interconnected via network switches, gradient aggregation and optimizers (e.g., SGD, Adagrad, Adam, and SMORMS3) are offloaded to an FPGA-based network switch between a host machine and remote GPUs; thus, the gradient aggregation and optimization are completed in the network. Evaluation results using four remote GPUs connected via the FPGA-based 10GbE switch that implements the four optimizers demonstrate that these optimization algorithms are accelerated by up to 3. 0x and 1. 25x compared to CPU and GPU implementations, respectively. Also, the gradient aggregation throughput by the FPGA-based switch achieves 98.3% of the 10GbE line rate.

元の言語English
ホスト出版物のタイトルProceedings - 2020 28th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2020
出版者Institute of Electrical and Electronics Engineers Inc.
ページ102-109
ページ数8
ISBN(電子版)9781728165820
DOI
出版物ステータスPublished - 2020 3
イベント28th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2020 - Vasteras, Sweden
継続期間: 2020 3 112020 3 13

出版物シリーズ

名前Proceedings - 2020 28th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2020

Conference

Conference28th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2020
Sweden
Vasteras
期間20/3/1120/3/13

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Networks and Communications
  • Information Systems and Management
  • Computational Mathematics
  • Control and Optimization
  • Health Informatics

フィンガープリント Accelerating Deep Learning using Multiple GPUs and FPGA-Based 10GbE Switch' の研究トピックを掘り下げます。これらはともに一意のフィンガープリントを構成します。

  • これを引用

    Itsubo, T., Koibuchi, M., Amano, H., & Matsutani, H. (2020). Accelerating Deep Learning using Multiple GPUs and FPGA-Based 10GbE Switch. : Proceedings - 2020 28th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2020 (pp. 102-109). [9092145] (Proceedings - 2020 28th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2020). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/PDP50117.2020.00022