TY - GEN
T1 - Performance Evaluation of PEACH3
T2 - 8th International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies, HEART 2017
AU - Kaneda, Takahiro
AU - Sakai, Ryotaro
AU - Nishikawa, Naoki
AU - Hanawa, Toshihiro
AU - Tsuruta, Chiharu
AU - Amano, Hideharu
N1 - Funding Information:
The present study was supported in part by the JST/CREST program entitled”Research and Development on Unified Environment of Accelerated Computing and Interconnection for Post-Petascale Era” in the research area of”Development of System Software Technologies for post-Peta Scale High Performance Computing”.
Funding Information:
The present study was supported in part by the JST/CREST program entitled?Research and Development on Unified Environment of Accelerated Computing and Interconnection for Post-Petascale Era? in the research area of?Development of System Software Technologies for post-Peta Scale High Performance Computing?.
PY - 2017/6/7
Y1 - 2017/6/7
N2 - An FPGA switching hub for tightly coupled accelerators (TCA) architecture called PEACH3 (PCI-Express Adaptive Communication Hub ver. 3) is evaluated and its communication speed is analyzed. PEACH3 connects a number of GPUs directly through PCI express Gen3x8 ports. The latency of inter-node GPU-GPU communication of PEACH3 was about 2.8 µ sec which is one third of that of CUDA API with MPI/Infiniband. The bandwidth was about 1.21 times of that of the previous version PEACH2, and 1.54 times of that with MPI/Infiniband for 512KB data transfer. Two application programs: BFS (breadth first search) and CG (conjugate gradient) were implemented with TCA IP and CUDA IP with MPI/Infiniband. The performance of BFS with PEACH3 was 1.16 times better than that with PEACH2, and 1.3 times better than that with MPI/Infiniband for a graph with scale = 15. In CG, for the small matrix (CLASS=S), the PEACH3 achieved 12% better performance than that with PEACH2 and 25% with MPI/Infiniband. However, since the bandwidth of PEACH3 with PCI gen3x8 is smaller than Infiniband with PCI gen3x16, the performance benefit was disappeared for CLASS=A matrix. Through the evaluation, it appears that if the data size is small, using TCA API with PEACH3 is advantageous even for intra-node communication.
AB - An FPGA switching hub for tightly coupled accelerators (TCA) architecture called PEACH3 (PCI-Express Adaptive Communication Hub ver. 3) is evaluated and its communication speed is analyzed. PEACH3 connects a number of GPUs directly through PCI express Gen3x8 ports. The latency of inter-node GPU-GPU communication of PEACH3 was about 2.8 µ sec which is one third of that of CUDA API with MPI/Infiniband. The bandwidth was about 1.21 times of that of the previous version PEACH2, and 1.54 times of that with MPI/Infiniband for 512KB data transfer. Two application programs: BFS (breadth first search) and CG (conjugate gradient) were implemented with TCA IP and CUDA IP with MPI/Infiniband. The performance of BFS with PEACH3 was 1.16 times better than that with PEACH2, and 1.3 times better than that with MPI/Infiniband for a graph with scale = 15. In CG, for the small matrix (CLASS=S), the PEACH3 achieved 12% better performance than that with PEACH2 and 25% with MPI/Infiniband. However, since the bandwidth of PEACH3 with PCI gen3x8 is smaller than Infiniband with PCI gen3x16, the performance benefit was disappeared for CLASS=A matrix. Through the evaluation, it appears that if the data size is small, using TCA API with PEACH3 is advantageous even for intra-node communication.
KW - Cluster
KW - GPU
KW - PEACH3
KW - TCA
UR - http://www.scopus.com/inward/record.url?scp=85040669115&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85040669115&partnerID=8YFLogxK
U2 - 10.1145/3120895.3120911
DO - 10.1145/3120895.3120911
M3 - Conference contribution
AN - SCOPUS:85040669115
T3 - ACM International Conference Proceeding Series
BT - Proceedings of the 8th International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies, HEART 2017
PB - Association for Computing Machinery
Y2 - 7 June 2017 through 9 June 2017
ER -