Prediction router: A low-latency on-chip router architecture with multiple predictors

Hiroki Matsutani, Michihiro Koibuchi, Hideharu Amano, Tsutomu Yoshinaga

Research output: Contribution to journalArticle

17 Citations (Scopus)

Abstract

Multi and many-core applications are sensitive to interprocessor communication latencies, suggesting the need for low-latency on-chip networks. We propose a low-latency router architecture that predicts the output channel to be used by the next packet transfer and speculatively completes the switch arbitration to reduce communication latency. The packets coming into the prediction routers are transferred without waiting for the routing computation and switch arbitration if the prediction hits. Thus, the primary concern for reducing communication latency is the hit rates of the prediction algorithms, which vary based on network environments, such as the network topology, routing algorithm, and traffic pattern. Although typical low-latency routers that skip one or more pipeline stages use a bypass data path that is based on a static or single bypassing policy (e.g., accelerating the packets moving in the same dimension), our prediction router architecture predictively forwards packets based on the prediction algorithm selected from among several candidates in response to the network environment. We analyze the prediction hit rates of five prediction algorithms on meshes, tori, fat trees, and Spidergons. Then, we present four case studies, each of which assumes different many-core architectures. We implemented the prediction routers for each case study by using a 45 nm CMOS process, and evaluated them in terms of the prediction hit rate, zero-load latency, hardware amount, and energy consumption. A typical prediction router with two or three predictors shows that although the area and energy are increased by 4.8-12.0 percent and 5.3 percent, respectively, up to 89.8 percent of the prediction hit rate is achieved in real applications, which provides favorable trade-offs between modest hardware/energy overheads and significant latency saving.

Original languageEnglish
Article number5703069
Pages (from-to)783-799
Number of pages17
JournalIEEE Transactions on Computers
Volume60
Issue number6
DOIs
Publication statusPublished - 2011

Fingerprint

Router
Routers
Latency
Predictors
Chip
Prediction
Hits
Percent
Arbitration
Many-core
Communication
Switch
Architecture
Switches
Hardware
Interprocessor Communication
Trees (mathematics)
Routing algorithms
Routing Algorithm
Oils and fats

Keywords

  • Interconnection networks
  • low-latency router architecture.
  • on-chip networks

ASJC Scopus subject areas

  • Hardware and Architecture
  • Software
  • Computational Theory and Mathematics
  • Theoretical Computer Science

Cite this

Prediction router : A low-latency on-chip router architecture with multiple predictors. / Matsutani, Hiroki; Koibuchi, Michihiro; Amano, Hideharu; Yoshinaga, Tsutomu.

In: IEEE Transactions on Computers, Vol. 60, No. 6, 5703069, 2011, p. 783-799.

Research output: Contribution to journalArticle

@article{aa8cde9ad1584a01abd0ab8b7dcc8c0f,
title = "Prediction router: A low-latency on-chip router architecture with multiple predictors",
abstract = "Multi and many-core applications are sensitive to interprocessor communication latencies, suggesting the need for low-latency on-chip networks. We propose a low-latency router architecture that predicts the output channel to be used by the next packet transfer and speculatively completes the switch arbitration to reduce communication latency. The packets coming into the prediction routers are transferred without waiting for the routing computation and switch arbitration if the prediction hits. Thus, the primary concern for reducing communication latency is the hit rates of the prediction algorithms, which vary based on network environments, such as the network topology, routing algorithm, and traffic pattern. Although typical low-latency routers that skip one or more pipeline stages use a bypass data path that is based on a static or single bypassing policy (e.g., accelerating the packets moving in the same dimension), our prediction router architecture predictively forwards packets based on the prediction algorithm selected from among several candidates in response to the network environment. We analyze the prediction hit rates of five prediction algorithms on meshes, tori, fat trees, and Spidergons. Then, we present four case studies, each of which assumes different many-core architectures. We implemented the prediction routers for each case study by using a 45 nm CMOS process, and evaluated them in terms of the prediction hit rate, zero-load latency, hardware amount, and energy consumption. A typical prediction router with two or three predictors shows that although the area and energy are increased by 4.8-12.0 percent and 5.3 percent, respectively, up to 89.8 percent of the prediction hit rate is achieved in real applications, which provides favorable trade-offs between modest hardware/energy overheads and significant latency saving.",
keywords = "Interconnection networks, low-latency router architecture., on-chip networks",
author = "Hiroki Matsutani and Michihiro Koibuchi and Hideharu Amano and Tsutomu Yoshinaga",
year = "2011",
doi = "10.1109/TC.2011.17",
language = "English",
volume = "60",
pages = "783--799",
journal = "IEEE Transactions on Computers",
issn = "0018-9340",
publisher = "IEEE Computer Society",
number = "6",

}

TY - JOUR

T1 - Prediction router

T2 - A low-latency on-chip router architecture with multiple predictors

AU - Matsutani, Hiroki

AU - Koibuchi, Michihiro

AU - Amano, Hideharu

AU - Yoshinaga, Tsutomu

PY - 2011

Y1 - 2011

N2 - Multi and many-core applications are sensitive to interprocessor communication latencies, suggesting the need for low-latency on-chip networks. We propose a low-latency router architecture that predicts the output channel to be used by the next packet transfer and speculatively completes the switch arbitration to reduce communication latency. The packets coming into the prediction routers are transferred without waiting for the routing computation and switch arbitration if the prediction hits. Thus, the primary concern for reducing communication latency is the hit rates of the prediction algorithms, which vary based on network environments, such as the network topology, routing algorithm, and traffic pattern. Although typical low-latency routers that skip one or more pipeline stages use a bypass data path that is based on a static or single bypassing policy (e.g., accelerating the packets moving in the same dimension), our prediction router architecture predictively forwards packets based on the prediction algorithm selected from among several candidates in response to the network environment. We analyze the prediction hit rates of five prediction algorithms on meshes, tori, fat trees, and Spidergons. Then, we present four case studies, each of which assumes different many-core architectures. We implemented the prediction routers for each case study by using a 45 nm CMOS process, and evaluated them in terms of the prediction hit rate, zero-load latency, hardware amount, and energy consumption. A typical prediction router with two or three predictors shows that although the area and energy are increased by 4.8-12.0 percent and 5.3 percent, respectively, up to 89.8 percent of the prediction hit rate is achieved in real applications, which provides favorable trade-offs between modest hardware/energy overheads and significant latency saving.

AB - Multi and many-core applications are sensitive to interprocessor communication latencies, suggesting the need for low-latency on-chip networks. We propose a low-latency router architecture that predicts the output channel to be used by the next packet transfer and speculatively completes the switch arbitration to reduce communication latency. The packets coming into the prediction routers are transferred without waiting for the routing computation and switch arbitration if the prediction hits. Thus, the primary concern for reducing communication latency is the hit rates of the prediction algorithms, which vary based on network environments, such as the network topology, routing algorithm, and traffic pattern. Although typical low-latency routers that skip one or more pipeline stages use a bypass data path that is based on a static or single bypassing policy (e.g., accelerating the packets moving in the same dimension), our prediction router architecture predictively forwards packets based on the prediction algorithm selected from among several candidates in response to the network environment. We analyze the prediction hit rates of five prediction algorithms on meshes, tori, fat trees, and Spidergons. Then, we present four case studies, each of which assumes different many-core architectures. We implemented the prediction routers for each case study by using a 45 nm CMOS process, and evaluated them in terms of the prediction hit rate, zero-load latency, hardware amount, and energy consumption. A typical prediction router with two or three predictors shows that although the area and energy are increased by 4.8-12.0 percent and 5.3 percent, respectively, up to 89.8 percent of the prediction hit rate is achieved in real applications, which provides favorable trade-offs between modest hardware/energy overheads and significant latency saving.

KW - Interconnection networks

KW - low-latency router architecture.

KW - on-chip networks

UR - http://www.scopus.com/inward/record.url?scp=79955487037&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79955487037&partnerID=8YFLogxK

U2 - 10.1109/TC.2011.17

DO - 10.1109/TC.2011.17

M3 - Article

AN - SCOPUS:79955487037

VL - 60

SP - 783

EP - 799

JO - IEEE Transactions on Computers

JF - IEEE Transactions on Computers

SN - 0018-9340

IS - 6

M1 - 5703069

ER -