Implementing a large application(LSTM) on the multi-FPGA system: Flow-in-Cloud

Yugo Yamauchi, Kazusa Musha, Hideharu Amano

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In order to cope with computation cost and energy required for the recent deep learning technology, domain specific systems have been used in the cloud computing so as to be shared by many application developers efficiently. Although GPU(Graphics Processing Unit)s and more specialized systems like TPU (Tensor Processing Unit)s have been popularly utilized, FPGAs have been receiving an attention especially for their power efficiency and flexibility. Since energy efficiency is one of the most important issues in recent cloud computing, a lot of researches to use FPGAs in the cloud have been asserted[3], and commercial systems including Amazon's F1 instance are available. However, the performance improvement in the FPGA is limited by the upper limit of the FPGA resource, even an FPGA in the cloud. Thus, in order to implement a large deep layer learning application, we must use an expensive high-end FPGA or adopt a lightweight algorithm sacrificing throughput and accuracy. To deal with this problem, in the project 'Power-saving AI engine and platform with heterogeneous engine integrated cloud' supported by NEDO started to develop a large-scale AI system called Flow-in-Cloud (FiC)[4]. FiC is consisting of a number of middle-scale economical FPGAs interconnected with high communication bandwidth network. From an HLS (High Level Synthesis) programmer, a lot of FPGAs can be handled as if they were a single large FPGA. The programmer can implement a large-scale deep learning model without caring resources on a single FPGA. FiC is managed by the Flow-OS, and shared efficiently by many users. Although FiC is designed to build a heterogeneous computing system, the current prototype is consisting of multiple FPGA boards called 'FiC-SW' each of which provides both the switching and computing capabilities. Here, as a case study of such an energy efficient multi-FPGA board computing, we implemented the inference part of Long Short Term Memory (LSTM) [1].

Original languageEnglish
Title of host publicationIEEE Symposium on Low-Power and High-Speed Chips and Systems, COOL CHIPS 2019 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781728117485
DOIs
Publication statusPublished - 2019 May 23
Event22nd IEEE Symposium on Low-Power and High-Speed Chips and Systems, COOL CHIPS 2019 - Yokohama, Japan
Duration: 2019 Apr 172019 Apr 19

Publication series

NameIEEE Symposium on Low-Power and High-Speed Chips and Systems, COOL CHIPS 2019 - Proceedings

Conference

Conference22nd IEEE Symposium on Low-Power and High-Speed Chips and Systems, COOL CHIPS 2019
CountryJapan
CityYokohama
Period19/4/1719/4/19

Fingerprint

Field programmable gate arrays (FPGA)
Cloud computing
Long short-term memory
Engines
Tensors
Energy efficiency
Large scale systems
Throughput
Bandwidth
Communication

ASJC Scopus subject areas

  • Hardware and Architecture
  • Electrical and Electronic Engineering

Cite this

Yamauchi, Y., Musha, K., & Amano, H. (2019). Implementing a large application(LSTM) on the multi-FPGA system: Flow-in-Cloud. In IEEE Symposium on Low-Power and High-Speed Chips and Systems, COOL CHIPS 2019 - Proceedings [8721333] (IEEE Symposium on Low-Power and High-Speed Chips and Systems, COOL CHIPS 2019 - Proceedings). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/CoolChips.2019.8721333

Implementing a large application(LSTM) on the multi-FPGA system : Flow-in-Cloud. / Yamauchi, Yugo; Musha, Kazusa; Amano, Hideharu.

IEEE Symposium on Low-Power and High-Speed Chips and Systems, COOL CHIPS 2019 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2019. 8721333 (IEEE Symposium on Low-Power and High-Speed Chips and Systems, COOL CHIPS 2019 - Proceedings).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Yamauchi, Y, Musha, K & Amano, H 2019, Implementing a large application(LSTM) on the multi-FPGA system: Flow-in-Cloud. in IEEE Symposium on Low-Power and High-Speed Chips and Systems, COOL CHIPS 2019 - Proceedings., 8721333, IEEE Symposium on Low-Power and High-Speed Chips and Systems, COOL CHIPS 2019 - Proceedings, Institute of Electrical and Electronics Engineers Inc., 22nd IEEE Symposium on Low-Power and High-Speed Chips and Systems, COOL CHIPS 2019, Yokohama, Japan, 19/4/17. https://doi.org/10.1109/CoolChips.2019.8721333
Yamauchi Y, Musha K, Amano H. Implementing a large application(LSTM) on the multi-FPGA system: Flow-in-Cloud. In IEEE Symposium on Low-Power and High-Speed Chips and Systems, COOL CHIPS 2019 - Proceedings. Institute of Electrical and Electronics Engineers Inc. 2019. 8721333. (IEEE Symposium on Low-Power and High-Speed Chips and Systems, COOL CHIPS 2019 - Proceedings). https://doi.org/10.1109/CoolChips.2019.8721333
Yamauchi, Yugo ; Musha, Kazusa ; Amano, Hideharu. / Implementing a large application(LSTM) on the multi-FPGA system : Flow-in-Cloud. IEEE Symposium on Low-Power and High-Speed Chips and Systems, COOL CHIPS 2019 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2019. (IEEE Symposium on Low-Power and High-Speed Chips and Systems, COOL CHIPS 2019 - Proceedings).
@inproceedings{72d46c18c3f7498388cb820cc0d9a31d,
title = "Implementing a large application(LSTM) on the multi-FPGA system: Flow-in-Cloud",
abstract = "In order to cope with computation cost and energy required for the recent deep learning technology, domain specific systems have been used in the cloud computing so as to be shared by many application developers efficiently. Although GPU(Graphics Processing Unit)s and more specialized systems like TPU (Tensor Processing Unit)s have been popularly utilized, FPGAs have been receiving an attention especially for their power efficiency and flexibility. Since energy efficiency is one of the most important issues in recent cloud computing, a lot of researches to use FPGAs in the cloud have been asserted[3], and commercial systems including Amazon's F1 instance are available. However, the performance improvement in the FPGA is limited by the upper limit of the FPGA resource, even an FPGA in the cloud. Thus, in order to implement a large deep layer learning application, we must use an expensive high-end FPGA or adopt a lightweight algorithm sacrificing throughput and accuracy. To deal with this problem, in the project 'Power-saving AI engine and platform with heterogeneous engine integrated cloud' supported by NEDO started to develop a large-scale AI system called Flow-in-Cloud (FiC)[4]. FiC is consisting of a number of middle-scale economical FPGAs interconnected with high communication bandwidth network. From an HLS (High Level Synthesis) programmer, a lot of FPGAs can be handled as if they were a single large FPGA. The programmer can implement a large-scale deep learning model without caring resources on a single FPGA. FiC is managed by the Flow-OS, and shared efficiently by many users. Although FiC is designed to build a heterogeneous computing system, the current prototype is consisting of multiple FPGA boards called 'FiC-SW' each of which provides both the switching and computing capabilities. Here, as a case study of such an energy efficient multi-FPGA board computing, we implemented the inference part of Long Short Term Memory (LSTM) [1].",
author = "Yugo Yamauchi and Kazusa Musha and Hideharu Amano",
year = "2019",
month = "5",
day = "23",
doi = "10.1109/CoolChips.2019.8721333",
language = "English",
series = "IEEE Symposium on Low-Power and High-Speed Chips and Systems, COOL CHIPS 2019 - Proceedings",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
booktitle = "IEEE Symposium on Low-Power and High-Speed Chips and Systems, COOL CHIPS 2019 - Proceedings",

}

TY - GEN

T1 - Implementing a large application(LSTM) on the multi-FPGA system

T2 - Flow-in-Cloud

AU - Yamauchi, Yugo

AU - Musha, Kazusa

AU - Amano, Hideharu

PY - 2019/5/23

Y1 - 2019/5/23

N2 - In order to cope with computation cost and energy required for the recent deep learning technology, domain specific systems have been used in the cloud computing so as to be shared by many application developers efficiently. Although GPU(Graphics Processing Unit)s and more specialized systems like TPU (Tensor Processing Unit)s have been popularly utilized, FPGAs have been receiving an attention especially for their power efficiency and flexibility. Since energy efficiency is one of the most important issues in recent cloud computing, a lot of researches to use FPGAs in the cloud have been asserted[3], and commercial systems including Amazon's F1 instance are available. However, the performance improvement in the FPGA is limited by the upper limit of the FPGA resource, even an FPGA in the cloud. Thus, in order to implement a large deep layer learning application, we must use an expensive high-end FPGA or adopt a lightweight algorithm sacrificing throughput and accuracy. To deal with this problem, in the project 'Power-saving AI engine and platform with heterogeneous engine integrated cloud' supported by NEDO started to develop a large-scale AI system called Flow-in-Cloud (FiC)[4]. FiC is consisting of a number of middle-scale economical FPGAs interconnected with high communication bandwidth network. From an HLS (High Level Synthesis) programmer, a lot of FPGAs can be handled as if they were a single large FPGA. The programmer can implement a large-scale deep learning model without caring resources on a single FPGA. FiC is managed by the Flow-OS, and shared efficiently by many users. Although FiC is designed to build a heterogeneous computing system, the current prototype is consisting of multiple FPGA boards called 'FiC-SW' each of which provides both the switching and computing capabilities. Here, as a case study of such an energy efficient multi-FPGA board computing, we implemented the inference part of Long Short Term Memory (LSTM) [1].

AB - In order to cope with computation cost and energy required for the recent deep learning technology, domain specific systems have been used in the cloud computing so as to be shared by many application developers efficiently. Although GPU(Graphics Processing Unit)s and more specialized systems like TPU (Tensor Processing Unit)s have been popularly utilized, FPGAs have been receiving an attention especially for their power efficiency and flexibility. Since energy efficiency is one of the most important issues in recent cloud computing, a lot of researches to use FPGAs in the cloud have been asserted[3], and commercial systems including Amazon's F1 instance are available. However, the performance improvement in the FPGA is limited by the upper limit of the FPGA resource, even an FPGA in the cloud. Thus, in order to implement a large deep layer learning application, we must use an expensive high-end FPGA or adopt a lightweight algorithm sacrificing throughput and accuracy. To deal with this problem, in the project 'Power-saving AI engine and platform with heterogeneous engine integrated cloud' supported by NEDO started to develop a large-scale AI system called Flow-in-Cloud (FiC)[4]. FiC is consisting of a number of middle-scale economical FPGAs interconnected with high communication bandwidth network. From an HLS (High Level Synthesis) programmer, a lot of FPGAs can be handled as if they were a single large FPGA. The programmer can implement a large-scale deep learning model without caring resources on a single FPGA. FiC is managed by the Flow-OS, and shared efficiently by many users. Although FiC is designed to build a heterogeneous computing system, the current prototype is consisting of multiple FPGA boards called 'FiC-SW' each of which provides both the switching and computing capabilities. Here, as a case study of such an energy efficient multi-FPGA board computing, we implemented the inference part of Long Short Term Memory (LSTM) [1].

UR - http://www.scopus.com/inward/record.url?scp=85067132502&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85067132502&partnerID=8YFLogxK

U2 - 10.1109/CoolChips.2019.8721333

DO - 10.1109/CoolChips.2019.8721333

M3 - Conference contribution

AN - SCOPUS:85067132502

T3 - IEEE Symposium on Low-Power and High-Speed Chips and Systems, COOL CHIPS 2019 - Proceedings

BT - IEEE Symposium on Low-Power and High-Speed Chips and Systems, COOL CHIPS 2019 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

ER -