Acceleration of deep recurrent neural networks with an FPGA cluster

Yuxi Sun, Akram Ben Ahmed, Hideharu Amano

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper, we propose an acceleration methodology for deep recurrent neural networks (RNNs) implemented on a multi-FPGA platform called Flow-in-Cloud (FiC). RNNs have been proven effective for modeling temporal sequences, such as human speech and written text. However, the implementation of RNNs on traditional hardware is inefficient due to their long-range dependence and irregular computation patterns. This inefficiency manifests itself in the proportional increase of run time with respect to the number of layers of deep RNNs when running on traditional hardware platforms such as a CPUs. Previous works have mostly focused on the optimization of a single RNN cell. In this work, we take advantage of the multi-FPGA system to demonstrate that we can reduce the run time of deep RNNs from O(k) to O(1).

Original languageEnglish
Title of host publicationProceedings of the 10th International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies, HEART 2019
PublisherAssociation for Computing Machinery
ISBN (Electronic)9781450372558
DOIs
Publication statusPublished - 2019 Jun 6
Event10th International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies, HEART 2019 - Nagasaki, Japan
Duration: 2019 Jun 62019 Jun 7

Publication series

NameACM International Conference Proceeding Series

Conference

Conference10th International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies, HEART 2019
CountryJapan
CityNagasaki
Period19/6/619/6/7

Fingerprint

Recurrent neural networks
Field programmable gate arrays (FPGA)
Computer hardware
Program processors
Hardware

Keywords

  • Acceleration
  • FPGAs
  • Recurrent Neural Networks

ASJC Scopus subject areas

  • Software
  • Human-Computer Interaction
  • Computer Vision and Pattern Recognition
  • Computer Networks and Communications

Cite this

Sun, Y., Ben Ahmed, A., & Amano, H. (2019). Acceleration of deep recurrent neural networks with an FPGA cluster. In Proceedings of the 10th International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies, HEART 2019 [18] (ACM International Conference Proceeding Series). Association for Computing Machinery. https://doi.org/10.1145/3337801.3337804

Acceleration of deep recurrent neural networks with an FPGA cluster. / Sun, Yuxi; Ben Ahmed, Akram; Amano, Hideharu.

Proceedings of the 10th International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies, HEART 2019. Association for Computing Machinery, 2019. 18 (ACM International Conference Proceeding Series).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Sun, Y, Ben Ahmed, A & Amano, H 2019, Acceleration of deep recurrent neural networks with an FPGA cluster. in Proceedings of the 10th International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies, HEART 2019., 18, ACM International Conference Proceeding Series, Association for Computing Machinery, 10th International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies, HEART 2019, Nagasaki, Japan, 19/6/6. https://doi.org/10.1145/3337801.3337804
Sun Y, Ben Ahmed A, Amano H. Acceleration of deep recurrent neural networks with an FPGA cluster. In Proceedings of the 10th International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies, HEART 2019. Association for Computing Machinery. 2019. 18. (ACM International Conference Proceeding Series). https://doi.org/10.1145/3337801.3337804
Sun, Yuxi ; Ben Ahmed, Akram ; Amano, Hideharu. / Acceleration of deep recurrent neural networks with an FPGA cluster. Proceedings of the 10th International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies, HEART 2019. Association for Computing Machinery, 2019. (ACM International Conference Proceeding Series).
@inproceedings{9533c221f3724ca1a950348952679397,
title = "Acceleration of deep recurrent neural networks with an FPGA cluster",
abstract = "In this paper, we propose an acceleration methodology for deep recurrent neural networks (RNNs) implemented on a multi-FPGA platform called Flow-in-Cloud (FiC). RNNs have been proven effective for modeling temporal sequences, such as human speech and written text. However, the implementation of RNNs on traditional hardware is inefficient due to their long-range dependence and irregular computation patterns. This inefficiency manifests itself in the proportional increase of run time with respect to the number of layers of deep RNNs when running on traditional hardware platforms such as a CPUs. Previous works have mostly focused on the optimization of a single RNN cell. In this work, we take advantage of the multi-FPGA system to demonstrate that we can reduce the run time of deep RNNs from O(k) to O(1).",
keywords = "Acceleration, FPGAs, Recurrent Neural Networks",
author = "Yuxi Sun and {Ben Ahmed}, Akram and Hideharu Amano",
year = "2019",
month = "6",
day = "6",
doi = "10.1145/3337801.3337804",
language = "English",
series = "ACM International Conference Proceeding Series",
publisher = "Association for Computing Machinery",
booktitle = "Proceedings of the 10th International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies, HEART 2019",

}

TY - GEN

T1 - Acceleration of deep recurrent neural networks with an FPGA cluster

AU - Sun, Yuxi

AU - Ben Ahmed, Akram

AU - Amano, Hideharu

PY - 2019/6/6

Y1 - 2019/6/6

N2 - In this paper, we propose an acceleration methodology for deep recurrent neural networks (RNNs) implemented on a multi-FPGA platform called Flow-in-Cloud (FiC). RNNs have been proven effective for modeling temporal sequences, such as human speech and written text. However, the implementation of RNNs on traditional hardware is inefficient due to their long-range dependence and irregular computation patterns. This inefficiency manifests itself in the proportional increase of run time with respect to the number of layers of deep RNNs when running on traditional hardware platforms such as a CPUs. Previous works have mostly focused on the optimization of a single RNN cell. In this work, we take advantage of the multi-FPGA system to demonstrate that we can reduce the run time of deep RNNs from O(k) to O(1).

AB - In this paper, we propose an acceleration methodology for deep recurrent neural networks (RNNs) implemented on a multi-FPGA platform called Flow-in-Cloud (FiC). RNNs have been proven effective for modeling temporal sequences, such as human speech and written text. However, the implementation of RNNs on traditional hardware is inefficient due to their long-range dependence and irregular computation patterns. This inefficiency manifests itself in the proportional increase of run time with respect to the number of layers of deep RNNs when running on traditional hardware platforms such as a CPUs. Previous works have mostly focused on the optimization of a single RNN cell. In this work, we take advantage of the multi-FPGA system to demonstrate that we can reduce the run time of deep RNNs from O(k) to O(1).

KW - Acceleration

KW - FPGAs

KW - Recurrent Neural Networks

UR - http://www.scopus.com/inward/record.url?scp=85070566081&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85070566081&partnerID=8YFLogxK

U2 - 10.1145/3337801.3337804

DO - 10.1145/3337801.3337804

M3 - Conference contribution

AN - SCOPUS:85070566081

T3 - ACM International Conference Proceeding Series

BT - Proceedings of the 10th International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies, HEART 2019

PB - Association for Computing Machinery

ER -