Implementing a large application(LSTM) on the multi-FPGA system: Flow-in-Cloud

Yugo Yamauchi, Kazusa Musha, Hideharu Amano

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

In order to cope with computation cost and energy required for the recent deep learning technology, domain specific systems have been used in the cloud computing so as to be shared by many application developers efficiently. Although GPU(Graphics Processing Unit)s and more specialized systems like TPU (Tensor Processing Unit)s have been popularly utilized, FPGAs have been receiving an attention especially for their power efficiency and flexibility. Since energy efficiency is one of the most important issues in recent cloud computing, a lot of researches to use FPGAs in the cloud have been asserted[3], and commercial systems including Amazon's F1 instance are available. However, the performance improvement in the FPGA is limited by the upper limit of the FPGA resource, even an FPGA in the cloud. Thus, in order to implement a large deep layer learning application, we must use an expensive high-end FPGA or adopt a lightweight algorithm sacrificing throughput and accuracy. To deal with this problem, in the project 'Power-saving AI engine and platform with heterogeneous engine integrated cloud' supported by NEDO started to develop a large-scale AI system called Flow-in-Cloud (FiC)[4]. FiC is consisting of a number of middle-scale economical FPGAs interconnected with high communication bandwidth network. From an HLS (High Level Synthesis) programmer, a lot of FPGAs can be handled as if they were a single large FPGA. The programmer can implement a large-scale deep learning model without caring resources on a single FPGA. FiC is managed by the Flow-OS, and shared efficiently by many users. Although FiC is designed to build a heterogeneous computing system, the current prototype is consisting of multiple FPGA boards called 'FiC-SW' each of which provides both the switching and computing capabilities. Here, as a case study of such an energy efficient multi-FPGA board computing, we implemented the inference part of Long Short Term Memory (LSTM) [1].

Original languageEnglish
Title of host publicationIEEE Symposium on Low-Power and High-Speed Chips and Systems, COOL CHIPS 2019 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781728117485
DOIs
Publication statusPublished - 2019 May 23
Event22nd IEEE Symposium on Low-Power and High-Speed Chips and Systems, COOL CHIPS 2019 - Yokohama, Japan
Duration: 2019 Apr 172019 Apr 19

Publication series

NameIEEE Symposium on Low-Power and High-Speed Chips and Systems, COOL CHIPS 2019 - Proceedings

Conference

Conference22nd IEEE Symposium on Low-Power and High-Speed Chips and Systems, COOL CHIPS 2019
CountryJapan
CityYokohama
Period19/4/1719/4/19

ASJC Scopus subject areas

  • Hardware and Architecture
  • Electrical and Electronic Engineering

Fingerprint Dive into the research topics of 'Implementing a large application(LSTM) on the multi-FPGA system: Flow-in-Cloud'. Together they form a unique fingerprint.

  • Cite this

    Yamauchi, Y., Musha, K., & Amano, H. (2019). Implementing a large application(LSTM) on the multi-FPGA system: Flow-in-Cloud. In IEEE Symposium on Low-Power and High-Speed Chips and Systems, COOL CHIPS 2019 - Proceedings [8721333] (IEEE Symposium on Low-Power and High-Speed Chips and Systems, COOL CHIPS 2019 - Proceedings). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/CoolChips.2019.8721333