Performance analysis of clearspeed's CSX600 interconnects

Yuri Nishikawa, Michihiro Koibuchi, Masato Yoshimi, Akihiro Shitara, Kenichi Miura, Hideharu Amano

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

ClearSpeed's CSX600 that consists of 96 Processing Elements (PEs) employs a one-dimensional array topology for a simple SIMD processing. To clearly show the performance factors and practical issues of NoCs in an existing modern many-core SIMD system, this paper measures and analyzes NoCs of CSX600 called Swazzle and ClearConnect. Evaluation and analysis results show that the sending and receiving overheads are the major limitation factors to the effective network bandwidth. We found that (1) the number of used PEs, (2) the size of transferred data, and (3) data alignment of a shared memory are three main points to make the best use of bandwidth. In addition, we estimated the best- and worst-case latencies of data transfers in parallel applications.

Original languageEnglish
Title of host publicationProceedings - 2009 IEEE International Symposium on Parallel and Distributed Processing with Applications, ISPA 2009
Pages203-210
Number of pages8
DOIs
Publication statusPublished - 2009
Event2009 IEEE International Symposium on Parallel and Distributed Processing with Applications, ISPA 2009 - Chengdu, Sichuan, China
Duration: 2009 Aug 92009 Aug 12

Other

Other2009 IEEE International Symposium on Parallel and Distributed Processing with Applications, ISPA 2009
CountryChina
CityChengdu, Sichuan
Period09/8/909/8/12

Fingerprint

Processing
Bandwidth
Data transfer
Topology
Data storage equipment

ASJC Scopus subject areas

  • Computer Science Applications
  • Software

Cite this

Nishikawa, Y., Koibuchi, M., Yoshimi, M., Shitara, A., Miura, K., & Amano, H. (2009). Performance analysis of clearspeed's CSX600 interconnects. In Proceedings - 2009 IEEE International Symposium on Parallel and Distributed Processing with Applications, ISPA 2009 (pp. 203-210). [5207934] https://doi.org/10.1109/ISPA.2009.102

Performance analysis of clearspeed's CSX600 interconnects. / Nishikawa, Yuri; Koibuchi, Michihiro; Yoshimi, Masato; Shitara, Akihiro; Miura, Kenichi; Amano, Hideharu.

Proceedings - 2009 IEEE International Symposium on Parallel and Distributed Processing with Applications, ISPA 2009. 2009. p. 203-210 5207934.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Nishikawa, Y, Koibuchi, M, Yoshimi, M, Shitara, A, Miura, K & Amano, H 2009, Performance analysis of clearspeed's CSX600 interconnects. in Proceedings - 2009 IEEE International Symposium on Parallel and Distributed Processing with Applications, ISPA 2009., 5207934, pp. 203-210, 2009 IEEE International Symposium on Parallel and Distributed Processing with Applications, ISPA 2009, Chengdu, Sichuan, China, 09/8/9. https://doi.org/10.1109/ISPA.2009.102
Nishikawa Y, Koibuchi M, Yoshimi M, Shitara A, Miura K, Amano H. Performance analysis of clearspeed's CSX600 interconnects. In Proceedings - 2009 IEEE International Symposium on Parallel and Distributed Processing with Applications, ISPA 2009. 2009. p. 203-210. 5207934 https://doi.org/10.1109/ISPA.2009.102
Nishikawa, Yuri ; Koibuchi, Michihiro ; Yoshimi, Masato ; Shitara, Akihiro ; Miura, Kenichi ; Amano, Hideharu. / Performance analysis of clearspeed's CSX600 interconnects. Proceedings - 2009 IEEE International Symposium on Parallel and Distributed Processing with Applications, ISPA 2009. 2009. pp. 203-210
@inproceedings{93bff7b088c4492f9a6b66f16a7843b4,
title = "Performance analysis of clearspeed's CSX600 interconnects",
abstract = "ClearSpeed's CSX600 that consists of 96 Processing Elements (PEs) employs a one-dimensional array topology for a simple SIMD processing. To clearly show the performance factors and practical issues of NoCs in an existing modern many-core SIMD system, this paper measures and analyzes NoCs of CSX600 called Swazzle and ClearConnect. Evaluation and analysis results show that the sending and receiving overheads are the major limitation factors to the effective network bandwidth. We found that (1) the number of used PEs, (2) the size of transferred data, and (3) data alignment of a shared memory are three main points to make the best use of bandwidth. In addition, we estimated the best- and worst-case latencies of data transfers in parallel applications.",
author = "Yuri Nishikawa and Michihiro Koibuchi and Masato Yoshimi and Akihiro Shitara and Kenichi Miura and Hideharu Amano",
year = "2009",
doi = "10.1109/ISPA.2009.102",
language = "English",
isbn = "9780769537474",
pages = "203--210",
booktitle = "Proceedings - 2009 IEEE International Symposium on Parallel and Distributed Processing with Applications, ISPA 2009",

}

TY - GEN

T1 - Performance analysis of clearspeed's CSX600 interconnects

AU - Nishikawa, Yuri

AU - Koibuchi, Michihiro

AU - Yoshimi, Masato

AU - Shitara, Akihiro

AU - Miura, Kenichi

AU - Amano, Hideharu

PY - 2009

Y1 - 2009

N2 - ClearSpeed's CSX600 that consists of 96 Processing Elements (PEs) employs a one-dimensional array topology for a simple SIMD processing. To clearly show the performance factors and practical issues of NoCs in an existing modern many-core SIMD system, this paper measures and analyzes NoCs of CSX600 called Swazzle and ClearConnect. Evaluation and analysis results show that the sending and receiving overheads are the major limitation factors to the effective network bandwidth. We found that (1) the number of used PEs, (2) the size of transferred data, and (3) data alignment of a shared memory are three main points to make the best use of bandwidth. In addition, we estimated the best- and worst-case latencies of data transfers in parallel applications.

AB - ClearSpeed's CSX600 that consists of 96 Processing Elements (PEs) employs a one-dimensional array topology for a simple SIMD processing. To clearly show the performance factors and practical issues of NoCs in an existing modern many-core SIMD system, this paper measures and analyzes NoCs of CSX600 called Swazzle and ClearConnect. Evaluation and analysis results show that the sending and receiving overheads are the major limitation factors to the effective network bandwidth. We found that (1) the number of used PEs, (2) the size of transferred data, and (3) data alignment of a shared memory are three main points to make the best use of bandwidth. In addition, we estimated the best- and worst-case latencies of data transfers in parallel applications.

UR - http://www.scopus.com/inward/record.url?scp=70449466935&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=70449466935&partnerID=8YFLogxK

U2 - 10.1109/ISPA.2009.102

DO - 10.1109/ISPA.2009.102

M3 - Conference contribution

SN - 9780769537474

SP - 203

EP - 210

BT - Proceedings - 2009 IEEE International Symposium on Parallel and Distributed Processing with Applications, ISPA 2009

ER -