Performance analysis of clearspeed's CSX600 interconnects

Yuri Nishikawa, Michihiro Koibuchi, Masato Yoshimi, Akihiro Shitara, Kenichi Miura, Hideharu Amano

研究成果: Conference contribution

抄録

ClearSpeed's CSX600 that consists of 96 Processing Elements (PEs) employs a one-dimensional array topology for a simple SIMD processing. To clearly show the performance factors and practical issues of NoCs in an existing modern many-core SIMD system, this paper measures and analyzes NoCs of CSX600 called Swazzle and ClearConnect. Evaluation and analysis results show that the sending and receiving overheads are the major limitation factors to the effective network bandwidth. We found that (1) the number of used PEs, (2) the size of transferred data, and (3) data alignment of a shared memory are three main points to make the best use of bandwidth. In addition, we estimated the best- and worst-case latencies of data transfers in parallel applications.

元の言語English
ホスト出版物のタイトルProceedings - 2009 IEEE International Symposium on Parallel and Distributed Processing with Applications, ISPA 2009
ページ203-210
ページ数8
DOI
出版物ステータスPublished - 2009
イベント2009 IEEE International Symposium on Parallel and Distributed Processing with Applications, ISPA 2009 - Chengdu, Sichuan, China
継続期間: 2009 8 92009 8 12

Other

Other2009 IEEE International Symposium on Parallel and Distributed Processing with Applications, ISPA 2009
China
Chengdu, Sichuan
期間09/8/909/8/12

Fingerprint

Processing
Bandwidth
Data transfer
Topology
Data storage equipment

ASJC Scopus subject areas

  • Computer Science Applications
  • Software

これを引用

Nishikawa, Y., Koibuchi, M., Yoshimi, M., Shitara, A., Miura, K., & Amano, H. (2009). Performance analysis of clearspeed's CSX600 interconnects. : Proceedings - 2009 IEEE International Symposium on Parallel and Distributed Processing with Applications, ISPA 2009 (pp. 203-210). [5207934] https://doi.org/10.1109/ISPA.2009.102

Performance analysis of clearspeed's CSX600 interconnects. / Nishikawa, Yuri; Koibuchi, Michihiro; Yoshimi, Masato; Shitara, Akihiro; Miura, Kenichi; Amano, Hideharu.

Proceedings - 2009 IEEE International Symposium on Parallel and Distributed Processing with Applications, ISPA 2009. 2009. p. 203-210 5207934.

研究成果: Conference contribution

Nishikawa, Y, Koibuchi, M, Yoshimi, M, Shitara, A, Miura, K & Amano, H 2009, Performance analysis of clearspeed's CSX600 interconnects. : Proceedings - 2009 IEEE International Symposium on Parallel and Distributed Processing with Applications, ISPA 2009., 5207934, pp. 203-210, 2009 IEEE International Symposium on Parallel and Distributed Processing with Applications, ISPA 2009, Chengdu, Sichuan, China, 09/8/9. https://doi.org/10.1109/ISPA.2009.102
Nishikawa Y, Koibuchi M, Yoshimi M, Shitara A, Miura K, Amano H. Performance analysis of clearspeed's CSX600 interconnects. : Proceedings - 2009 IEEE International Symposium on Parallel and Distributed Processing with Applications, ISPA 2009. 2009. p. 203-210. 5207934 https://doi.org/10.1109/ISPA.2009.102
Nishikawa, Yuri ; Koibuchi, Michihiro ; Yoshimi, Masato ; Shitara, Akihiro ; Miura, Kenichi ; Amano, Hideharu. / Performance analysis of clearspeed's CSX600 interconnects. Proceedings - 2009 IEEE International Symposium on Parallel and Distributed Processing with Applications, ISPA 2009. 2009. pp. 203-210
@inproceedings{93bff7b088c4492f9a6b66f16a7843b4,
title = "Performance analysis of clearspeed's CSX600 interconnects",
abstract = "ClearSpeed's CSX600 that consists of 96 Processing Elements (PEs) employs a one-dimensional array topology for a simple SIMD processing. To clearly show the performance factors and practical issues of NoCs in an existing modern many-core SIMD system, this paper measures and analyzes NoCs of CSX600 called Swazzle and ClearConnect. Evaluation and analysis results show that the sending and receiving overheads are the major limitation factors to the effective network bandwidth. We found that (1) the number of used PEs, (2) the size of transferred data, and (3) data alignment of a shared memory are three main points to make the best use of bandwidth. In addition, we estimated the best- and worst-case latencies of data transfers in parallel applications.",
author = "Yuri Nishikawa and Michihiro Koibuchi and Masato Yoshimi and Akihiro Shitara and Kenichi Miura and Hideharu Amano",
year = "2009",
doi = "10.1109/ISPA.2009.102",
language = "English",
isbn = "9780769537474",
pages = "203--210",
booktitle = "Proceedings - 2009 IEEE International Symposium on Parallel and Distributed Processing with Applications, ISPA 2009",

}

TY - GEN

T1 - Performance analysis of clearspeed's CSX600 interconnects

AU - Nishikawa, Yuri

AU - Koibuchi, Michihiro

AU - Yoshimi, Masato

AU - Shitara, Akihiro

AU - Miura, Kenichi

AU - Amano, Hideharu

PY - 2009

Y1 - 2009

N2 - ClearSpeed's CSX600 that consists of 96 Processing Elements (PEs) employs a one-dimensional array topology for a simple SIMD processing. To clearly show the performance factors and practical issues of NoCs in an existing modern many-core SIMD system, this paper measures and analyzes NoCs of CSX600 called Swazzle and ClearConnect. Evaluation and analysis results show that the sending and receiving overheads are the major limitation factors to the effective network bandwidth. We found that (1) the number of used PEs, (2) the size of transferred data, and (3) data alignment of a shared memory are three main points to make the best use of bandwidth. In addition, we estimated the best- and worst-case latencies of data transfers in parallel applications.

AB - ClearSpeed's CSX600 that consists of 96 Processing Elements (PEs) employs a one-dimensional array topology for a simple SIMD processing. To clearly show the performance factors and practical issues of NoCs in an existing modern many-core SIMD system, this paper measures and analyzes NoCs of CSX600 called Swazzle and ClearConnect. Evaluation and analysis results show that the sending and receiving overheads are the major limitation factors to the effective network bandwidth. We found that (1) the number of used PEs, (2) the size of transferred data, and (3) data alignment of a shared memory are three main points to make the best use of bandwidth. In addition, we estimated the best- and worst-case latencies of data transfers in parallel applications.

UR - http://www.scopus.com/inward/record.url?scp=70449466935&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=70449466935&partnerID=8YFLogxK

U2 - 10.1109/ISPA.2009.102

DO - 10.1109/ISPA.2009.102

M3 - Conference contribution

AN - SCOPUS:70449466935

SN - 9780769537474

SP - 203

EP - 210

BT - Proceedings - 2009 IEEE International Symposium on Parallel and Distributed Processing with Applications, ISPA 2009

ER -