Low-reliable low-latency networks optimized for HPC parallel applications

Truong Thao Nguyen, Hiroki Matsutani, Michihiro Koibuchi

研究成果: Conference contribution

5 被引用数 (Scopus)

抄録

High-end network standards, such as 400GbE, have been introduced Forwarding Error Correction (FEC) for maintaining the same bit error rate (BER) as that in traditional low-bandwidth interconnection networks. However, FEC operation latency overhead surprisingly becomes higher than the sum of all the other switch operation overheads, e.g., routing computation and switch allocation. FEC operation latency overhead significantly degrades the performance of parallel applications in HPC systems. Instead, in this study, we exploit the low-latency network design using a Hamming code that does not provide rigid error-free communication. Since it is consistent with existing frame format based on standard Reed-Solomon RS(544,514) with DC(64b/66b) direct linecode and TC(256b/257b) transcode, respectively, the influences upon the other network layer design are limited. Interestingly, a large number of parallel applications can accept the BER in such a Hamming code. Since lowering such a BER improves switch operation latency, the proposed network using the Hamming code improves the execution time of NAS Parallel Benchmarks by 56% on average when compared to the counterpart RS-FEC networks.

本文言語English
ホスト出版物のタイトルNCA 2018 - 2018 IEEE 17th International Symposium on Network Computing and Applications
出版社Institute of Electrical and Electronics Engineers Inc.
ISBN(電子版)9781538676592
DOI
出版ステータスPublished - 2018 11 26
イベント17th IEEE International Symposium on Network Computing and Applications, NCA 2018 - Cambridge, United States
継続期間: 2018 11 12018 11 3

出版物シリーズ

名前NCA 2018 - 2018 IEEE 17th International Symposium on Network Computing and Applications

Other

Other17th IEEE International Symposium on Network Computing and Applications, NCA 2018
国/地域United States
CityCambridge
Period18/11/118/11/3

ASJC Scopus subject areas

  • コンピュータ ネットワークおよび通信
  • コンピュータ サイエンスの応用
  • 安全性、リスク、信頼性、品質管理

フィンガープリント

「Low-reliable low-latency networks optimized for HPC parallel applications」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル