Performance improvement methodology for ClearSpeed's CSX600

Yuri Nishikawa, Michihiro Koibuchi, Masato Yoshimi, Kenichi Miura, Hideharu Amano

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Citations (Scopus)

Abstract

This paper focuses on a performance of network-on-a-chip (NoC) and I/O of ClearSpeed's CSX600 coprocessor with 96 multithread processing elements. Two versions of the Himeno Benchmark were implemented on the CSX600 to evaluate its performance when it encounters frequent memory transfers between shared and local memories, or between local memories. In order to efficiently use the NoC bandwidth, the dataflow was customized to the one-dimensional array structure of CSX600's NoC. The results of evaluation and profiling indicate that the performance was lower than 1/50 of the sustained performance. We show three key points to improve the performance on such a case: 1) exploiting bandwidth between mono and poly memory, 2) further program tuning, and 3) architectural reform.

Original languageEnglish
Title of host publicationProceedings of the International Conference on Parallel Processing
DOIs
Publication statusPublished - 2007
Event36th International Conference on Parallel Processing in Xi'an, ICPP - Xi'an, China
Duration: 2007 Sep 102007 Sep 14

Other

Other36th International Conference on Parallel Processing in Xi'an, ICPP
CountryChina
CityXi'an
Period07/9/1007/9/14

Fingerprint

Data storage equipment
Bandwidth
Tuning
Processing
Coprocessor

ASJC Scopus subject areas

  • Hardware and Architecture
  • Engineering(all)

Cite this

Nishikawa, Y., Koibuchi, M., Yoshimi, M., Miura, K., & Amano, H. (2007). Performance improvement methodology for ClearSpeed's CSX600. In Proceedings of the International Conference on Parallel Processing [4343884] https://doi.org/10.1109/ICPP.2007.66

Performance improvement methodology for ClearSpeed's CSX600. / Nishikawa, Yuri; Koibuchi, Michihiro; Yoshimi, Masato; Miura, Kenichi; Amano, Hideharu.

Proceedings of the International Conference on Parallel Processing. 2007. 4343884.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Nishikawa, Y, Koibuchi, M, Yoshimi, M, Miura, K & Amano, H 2007, Performance improvement methodology for ClearSpeed's CSX600. in Proceedings of the International Conference on Parallel Processing., 4343884, 36th International Conference on Parallel Processing in Xi'an, ICPP, Xi'an, China, 07/9/10. https://doi.org/10.1109/ICPP.2007.66
Nishikawa Y, Koibuchi M, Yoshimi M, Miura K, Amano H. Performance improvement methodology for ClearSpeed's CSX600. In Proceedings of the International Conference on Parallel Processing. 2007. 4343884 https://doi.org/10.1109/ICPP.2007.66
Nishikawa, Yuri ; Koibuchi, Michihiro ; Yoshimi, Masato ; Miura, Kenichi ; Amano, Hideharu. / Performance improvement methodology for ClearSpeed's CSX600. Proceedings of the International Conference on Parallel Processing. 2007.
@inproceedings{82631cced3864537b8380bec42464658,
title = "Performance improvement methodology for ClearSpeed's CSX600",
abstract = "This paper focuses on a performance of network-on-a-chip (NoC) and I/O of ClearSpeed's CSX600 coprocessor with 96 multithread processing elements. Two versions of the Himeno Benchmark were implemented on the CSX600 to evaluate its performance when it encounters frequent memory transfers between shared and local memories, or between local memories. In order to efficiently use the NoC bandwidth, the dataflow was customized to the one-dimensional array structure of CSX600's NoC. The results of evaluation and profiling indicate that the performance was lower than 1/50 of the sustained performance. We show three key points to improve the performance on such a case: 1) exploiting bandwidth between mono and poly memory, 2) further program tuning, and 3) architectural reform.",
author = "Yuri Nishikawa and Michihiro Koibuchi and Masato Yoshimi and Kenichi Miura and Hideharu Amano",
year = "2007",
doi = "10.1109/ICPP.2007.66",
language = "English",
isbn = "076952933X",
booktitle = "Proceedings of the International Conference on Parallel Processing",

}

TY - GEN

T1 - Performance improvement methodology for ClearSpeed's CSX600

AU - Nishikawa, Yuri

AU - Koibuchi, Michihiro

AU - Yoshimi, Masato

AU - Miura, Kenichi

AU - Amano, Hideharu

PY - 2007

Y1 - 2007

N2 - This paper focuses on a performance of network-on-a-chip (NoC) and I/O of ClearSpeed's CSX600 coprocessor with 96 multithread processing elements. Two versions of the Himeno Benchmark were implemented on the CSX600 to evaluate its performance when it encounters frequent memory transfers between shared and local memories, or between local memories. In order to efficiently use the NoC bandwidth, the dataflow was customized to the one-dimensional array structure of CSX600's NoC. The results of evaluation and profiling indicate that the performance was lower than 1/50 of the sustained performance. We show three key points to improve the performance on such a case: 1) exploiting bandwidth between mono and poly memory, 2) further program tuning, and 3) architectural reform.

AB - This paper focuses on a performance of network-on-a-chip (NoC) and I/O of ClearSpeed's CSX600 coprocessor with 96 multithread processing elements. Two versions of the Himeno Benchmark were implemented on the CSX600 to evaluate its performance when it encounters frequent memory transfers between shared and local memories, or between local memories. In order to efficiently use the NoC bandwidth, the dataflow was customized to the one-dimensional array structure of CSX600's NoC. The results of evaluation and profiling indicate that the performance was lower than 1/50 of the sustained performance. We show three key points to improve the performance on such a case: 1) exploiting bandwidth between mono and poly memory, 2) further program tuning, and 3) architectural reform.

UR - http://www.scopus.com/inward/record.url?scp=47249164386&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=47249164386&partnerID=8YFLogxK

U2 - 10.1109/ICPP.2007.66

DO - 10.1109/ICPP.2007.66

M3 - Conference contribution

SN - 076952933X

SN - 9780769529332

BT - Proceedings of the International Conference on Parallel Processing

ER -