An analytical network performance model for SIMD processor CSX600 interconnects

Yuri Nishikawa, Michihiro Koibuchi, Masato Yoshimi, Kenichi Miura, Hideharu Amano

Research output: Contribution to journalArticle

Abstract

One of the essential factors for an efficiently implementing and tuning applications on an SIMD many-core processor is to become familiar with the schematics of its networks-on-chip (NoC) architecture and performance. This paper focuses on modeling end-to-end latency of a one-dimensional SIMD many-core processor. In order to study precise and practical characteristics of actual end-to-end latency of modern SIMD many-core processors, this work analyzes performance of Swazzle and ClearConnect, both of which are one-dimensional NoCs of ClearSpeed's CSX600, an SIMD processor consisting of 96 Processing Elements (PEs). Evaluation and analysis results have shown that (1) the number of used PEs, (2) the size of transferred data, and (3) data alignment of a shared memory are dominant factors of network performance of CSX600. Based on these observations, we built a model for computing communication time. Using the model, we estimated the best- and the worst-case latencies for traffic patterns taken from several parallel application benchmarks. Finally, we confirmed that actual communication time of the benchmarks fit in between the best- and the worst-case values.

Original languageEnglish
Pages (from-to)146-159
Number of pages14
JournalJournal of Systems Architecture
Volume57
Issue number1
DOIs
Publication statusPublished - 2011 Jan

Fingerprint

Network performance
Communication
Schematic diagrams
Processing
Tuning
Data storage equipment
Network-on-chip

Keywords

  • Many-core processor
  • Network-on-chips (NoCs)
  • SIMD

ASJC Scopus subject areas

  • Hardware and Architecture
  • Software

Cite this

An analytical network performance model for SIMD processor CSX600 interconnects. / Nishikawa, Yuri; Koibuchi, Michihiro; Yoshimi, Masato; Miura, Kenichi; Amano, Hideharu.

In: Journal of Systems Architecture, Vol. 57, No. 1, 01.2011, p. 146-159.

Research output: Contribution to journalArticle

Nishikawa, Yuri ; Koibuchi, Michihiro ; Yoshimi, Masato ; Miura, Kenichi ; Amano, Hideharu. / An analytical network performance model for SIMD processor CSX600 interconnects. In: Journal of Systems Architecture. 2011 ; Vol. 57, No. 1. pp. 146-159.
@article{e70fc836c1cd45e9a18fc588e05bf6fc,
title = "An analytical network performance model for SIMD processor CSX600 interconnects",
abstract = "One of the essential factors for an efficiently implementing and tuning applications on an SIMD many-core processor is to become familiar with the schematics of its networks-on-chip (NoC) architecture and performance. This paper focuses on modeling end-to-end latency of a one-dimensional SIMD many-core processor. In order to study precise and practical characteristics of actual end-to-end latency of modern SIMD many-core processors, this work analyzes performance of Swazzle and ClearConnect, both of which are one-dimensional NoCs of ClearSpeed's CSX600, an SIMD processor consisting of 96 Processing Elements (PEs). Evaluation and analysis results have shown that (1) the number of used PEs, (2) the size of transferred data, and (3) data alignment of a shared memory are dominant factors of network performance of CSX600. Based on these observations, we built a model for computing communication time. Using the model, we estimated the best- and the worst-case latencies for traffic patterns taken from several parallel application benchmarks. Finally, we confirmed that actual communication time of the benchmarks fit in between the best- and the worst-case values.",
keywords = "Many-core processor, Network-on-chips (NoCs), SIMD",
author = "Yuri Nishikawa and Michihiro Koibuchi and Masato Yoshimi and Kenichi Miura and Hideharu Amano",
year = "2011",
month = "1",
doi = "10.1016/j.sysarc.2010.10.004",
language = "English",
volume = "57",
pages = "146--159",
journal = "Journal of Systems Architecture",
issn = "1383-7621",
publisher = "Elsevier",
number = "1",

}

TY - JOUR

T1 - An analytical network performance model for SIMD processor CSX600 interconnects

AU - Nishikawa, Yuri

AU - Koibuchi, Michihiro

AU - Yoshimi, Masato

AU - Miura, Kenichi

AU - Amano, Hideharu

PY - 2011/1

Y1 - 2011/1

N2 - One of the essential factors for an efficiently implementing and tuning applications on an SIMD many-core processor is to become familiar with the schematics of its networks-on-chip (NoC) architecture and performance. This paper focuses on modeling end-to-end latency of a one-dimensional SIMD many-core processor. In order to study precise and practical characteristics of actual end-to-end latency of modern SIMD many-core processors, this work analyzes performance of Swazzle and ClearConnect, both of which are one-dimensional NoCs of ClearSpeed's CSX600, an SIMD processor consisting of 96 Processing Elements (PEs). Evaluation and analysis results have shown that (1) the number of used PEs, (2) the size of transferred data, and (3) data alignment of a shared memory are dominant factors of network performance of CSX600. Based on these observations, we built a model for computing communication time. Using the model, we estimated the best- and the worst-case latencies for traffic patterns taken from several parallel application benchmarks. Finally, we confirmed that actual communication time of the benchmarks fit in between the best- and the worst-case values.

AB - One of the essential factors for an efficiently implementing and tuning applications on an SIMD many-core processor is to become familiar with the schematics of its networks-on-chip (NoC) architecture and performance. This paper focuses on modeling end-to-end latency of a one-dimensional SIMD many-core processor. In order to study precise and practical characteristics of actual end-to-end latency of modern SIMD many-core processors, this work analyzes performance of Swazzle and ClearConnect, both of which are one-dimensional NoCs of ClearSpeed's CSX600, an SIMD processor consisting of 96 Processing Elements (PEs). Evaluation and analysis results have shown that (1) the number of used PEs, (2) the size of transferred data, and (3) data alignment of a shared memory are dominant factors of network performance of CSX600. Based on these observations, we built a model for computing communication time. Using the model, we estimated the best- and the worst-case latencies for traffic patterns taken from several parallel application benchmarks. Finally, we confirmed that actual communication time of the benchmarks fit in between the best- and the worst-case values.

KW - Many-core processor

KW - Network-on-chips (NoCs)

KW - SIMD

UR - http://www.scopus.com/inward/record.url?scp=78650271799&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=78650271799&partnerID=8YFLogxK

U2 - 10.1016/j.sysarc.2010.10.004

DO - 10.1016/j.sysarc.2010.10.004

M3 - Article

AN - SCOPUS:78650271799

VL - 57

SP - 146

EP - 159

JO - Journal of Systems Architecture

JF - Journal of Systems Architecture

SN - 1383-7621

IS - 1

ER -