TY - GEN
T1 - A programming environment for multi-FPGA systems based on CyberWorkBench
T2 - 11th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies, HEART 2021
AU - Suzuki, Hiroaki
AU - Takahashi, Wataru
AU - Wakabayashi, Kazutoshi
AU - Amano, Hideharu
N1 - Funding Information:
Acknowledgement. This work was supported by JST CREST Grant Number JPMJCR19K1, Japan.
Publisher Copyright:
© 2021 ACM.
PY - 2021/6/21
Y1 - 2021/6/21
N2 - This paper proposes a multi-FPGA programming environment based on NEC's integrated design tool CyberWorkBench (CWB) for a multi-FPGA system FiC (Flow-in-Cloud). Programmers describe their program in SystemC as small modules connected with FIFO channels, then verify the operation with the behavioral simulation considering parallel execution. After the high-level synthesis (HLS) is done with CWB, modules distributed to each board are decided, and the interface module is inserted. The cycle accurate simulation is applied to ensure the operation and estimate the performance. Finally, generated Verilog HDL code for each board is implemented with Xilinx's Vivado just like the traditional design and configuration is obtained. As an example, a simple convolutional neural network LeNet is described and implemented on a real system using the tool. Although the cycle accurate simulation takes 105.34sec, the estimated cycles are only 2.2% difference from the real boards execution result. Since the example CNN LeNet is too small, it can be implemented into a single board with a traditional design tool. However, considering the pipeline execution, parallel execution with two boards can distribute the input and output into different FPGAs, and relax the bottleneck.
AB - This paper proposes a multi-FPGA programming environment based on NEC's integrated design tool CyberWorkBench (CWB) for a multi-FPGA system FiC (Flow-in-Cloud). Programmers describe their program in SystemC as small modules connected with FIFO channels, then verify the operation with the behavioral simulation considering parallel execution. After the high-level synthesis (HLS) is done with CWB, modules distributed to each board are decided, and the interface module is inserted. The cycle accurate simulation is applied to ensure the operation and estimate the performance. Finally, generated Verilog HDL code for each board is implemented with Xilinx's Vivado just like the traditional design and configuration is obtained. As an example, a simple convolutional neural network LeNet is described and implemented on a real system using the tool. Although the cycle accurate simulation takes 105.34sec, the estimated cycles are only 2.2% difference from the real boards execution result. Since the example CNN LeNet is too small, it can be implemented into a single board with a traditional design tool. However, considering the pipeline execution, parallel execution with two boards can distribute the input and output into different FPGAs, and relax the bottleneck.
UR - http://www.scopus.com/inward/record.url?scp=85109395298&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85109395298&partnerID=8YFLogxK
U2 - 10.1145/3468044.3468049
DO - 10.1145/3468044.3468049
M3 - Conference contribution
AN - SCOPUS:85109395298
T3 - ACM International Conference Proceeding Series
BT - Proceedings of the 11th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies, HEART 2021
PB - Association for Computing Machinery
Y2 - 21 June 2021 through 23 June 2021
ER -