Performance improvement methodology for ClearSpeed's CSX600

Yuri Nishikawa, Michihiro Koibuchi, Masato Yoshimi, Kenichi Miura, Hideharu Amano

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)


This paper focuses on a performance of network-on-a-chip (NoC) and I/O of ClearSpeed's CSX600 coprocessor with 96 multithread processing elements. Two versions of the Himeno Benchmark were implemented on the CSX600 to evaluate its performance when it encounters frequent memory transfers between shared and local memories, or between local memories. In order to efficiently use the NoC bandwidth, the dataflow was customized to the one-dimensional array structure of CSX600's NoC. The results of evaluation and profiling indicate that the performance was lower than 1/50 of the sustained performance. We show three key points to improve the performance on such a case: 1) exploiting bandwidth between mono and poly memory, 2) further program tuning, and 3) architectural reform.

Original languageEnglish
Title of host publication2007 International Conference on Parallel Processing, ICPP
Publication statusPublished - 2007 Dec 1
Event36th International Conference on Parallel Processing in Xi'an, ICPP - Xi'an, China
Duration: 2007 Sept 102007 Sept 14

Publication series

NameProceedings of the International Conference on Parallel Processing
ISSN (Print)0190-3918


Other36th International Conference on Parallel Processing in Xi'an, ICPP

ASJC Scopus subject areas

  • Hardware and Architecture
  • Engineering(all)


Dive into the research topics of 'Performance improvement methodology for ClearSpeed's CSX600'. Together they form a unique fingerprint.

Cite this