Prediction router: Yet another low latency on-chip router architecture

Hiroki Matsutani, Michihiro Koibuchi, Hideharu Amano, Tsutomu Yoshinaga

Research output: Chapter in Book/Report/Conference proceedingConference contribution

58 Citations (Scopus)

Abstract

Network-on-Chips (NoCs) are quite latency sensitive, since their communication latency strongly affects the application performance on recent many-core architectures. To reduce the communication latency, we propose a lowlatency router architecture that predicts an output channel being used by the next packet transfer and speculatively completes the switch arbitration. In the prediction routers, incoming packets are transferred without waiting the routing computation and switch arbitration if the prediction hits. Thus, the primary concern for reducing the communication latency is the hit rates of prediction algorithms, which vary from the network environments, such asthe network topology, routing algorithm, and traffic pattern. Although typical low-latency routers that speculatively skip one or more pipeline stages use a bypass datapath for specific packet transfers (e.g., packets moving on the same dimension), our prediction router predictively forwards packets based on a prediction algorithm selected from several candidates in response to the network environments. In this paper, we analyze the prediction hit rates of six prediction algorithms on meshes, tori, and fat trees. Then we provide three case studies, each of which assumes different many-core architecture. We have implemented a prediction router for each case study by using a 65nm CMOS process, and evaluated them in terms of the prediction hit rate, zero load latency, hardware amount, and energy consumption. The results show that although the area and energy are increased by 6.4-15.9% and 8.0-9.5% respectively, up to 89.8% of the prediction hit rate is achieved in real applications, which provide favorable trade-offs between the modest hardware/energy overheads and the latency saving.

Original languageEnglish
Title of host publicationProceedings - International Symposium on High-Performance Computer Architecture
Pages367-378
Number of pages12
DOIs
Publication statusPublished - 2009
Event2008 IEEE International Conference on Mechatronics and Automation, ICMA 2008 - Takamatsu, Japan
Duration: 2008 Aug 52008 Aug 8

Other

Other2008 IEEE International Conference on Mechatronics and Automation, ICMA 2008
CountryJapan
CityTakamatsu
Period08/8/508/8/8

Fingerprint

Routers
Communication
Switches
Hardware
Trees (mathematics)
Routing algorithms
Oils and fats
Energy utilization
Pipelines
Topology

ASJC Scopus subject areas

  • Hardware and Architecture

Cite this

Matsutani, H., Koibuchi, M., Amano, H., & Yoshinaga, T. (2009). Prediction router: Yet another low latency on-chip router architecture. In Proceedings - International Symposium on High-Performance Computer Architecture (pp. 367-378). [4798274] https://doi.org/10.1109/HPCA.2009.4798274

Prediction router : Yet another low latency on-chip router architecture. / Matsutani, Hiroki; Koibuchi, Michihiro; Amano, Hideharu; Yoshinaga, Tsutomu.

Proceedings - International Symposium on High-Performance Computer Architecture. 2009. p. 367-378 4798274.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Matsutani, H, Koibuchi, M, Amano, H & Yoshinaga, T 2009, Prediction router: Yet another low latency on-chip router architecture. in Proceedings - International Symposium on High-Performance Computer Architecture., 4798274, pp. 367-378, 2008 IEEE International Conference on Mechatronics and Automation, ICMA 2008, Takamatsu, Japan, 08/8/5. https://doi.org/10.1109/HPCA.2009.4798274
Matsutani H, Koibuchi M, Amano H, Yoshinaga T. Prediction router: Yet another low latency on-chip router architecture. In Proceedings - International Symposium on High-Performance Computer Architecture. 2009. p. 367-378. 4798274 https://doi.org/10.1109/HPCA.2009.4798274
Matsutani, Hiroki ; Koibuchi, Michihiro ; Amano, Hideharu ; Yoshinaga, Tsutomu. / Prediction router : Yet another low latency on-chip router architecture. Proceedings - International Symposium on High-Performance Computer Architecture. 2009. pp. 367-378
@inproceedings{c12621cfa4b14a3aa72b90553a2c7f2e,
title = "Prediction router: Yet another low latency on-chip router architecture",
abstract = "Network-on-Chips (NoCs) are quite latency sensitive, since their communication latency strongly affects the application performance on recent many-core architectures. To reduce the communication latency, we propose a lowlatency router architecture that predicts an output channel being used by the next packet transfer and speculatively completes the switch arbitration. In the prediction routers, incoming packets are transferred without waiting the routing computation and switch arbitration if the prediction hits. Thus, the primary concern for reducing the communication latency is the hit rates of prediction algorithms, which vary from the network environments, such asthe network topology, routing algorithm, and traffic pattern. Although typical low-latency routers that speculatively skip one or more pipeline stages use a bypass datapath for specific packet transfers (e.g., packets moving on the same dimension), our prediction router predictively forwards packets based on a prediction algorithm selected from several candidates in response to the network environments. In this paper, we analyze the prediction hit rates of six prediction algorithms on meshes, tori, and fat trees. Then we provide three case studies, each of which assumes different many-core architecture. We have implemented a prediction router for each case study by using a 65nm CMOS process, and evaluated them in terms of the prediction hit rate, zero load latency, hardware amount, and energy consumption. The results show that although the area and energy are increased by 6.4-15.9{\%} and 8.0-9.5{\%} respectively, up to 89.8{\%} of the prediction hit rate is achieved in real applications, which provide favorable trade-offs between the modest hardware/energy overheads and the latency saving.",
author = "Hiroki Matsutani and Michihiro Koibuchi and Hideharu Amano and Tsutomu Yoshinaga",
year = "2009",
doi = "10.1109/HPCA.2009.4798274",
language = "English",
isbn = "9781424429325",
pages = "367--378",
booktitle = "Proceedings - International Symposium on High-Performance Computer Architecture",

}

TY - GEN

T1 - Prediction router

T2 - Yet another low latency on-chip router architecture

AU - Matsutani, Hiroki

AU - Koibuchi, Michihiro

AU - Amano, Hideharu

AU - Yoshinaga, Tsutomu

PY - 2009

Y1 - 2009

N2 - Network-on-Chips (NoCs) are quite latency sensitive, since their communication latency strongly affects the application performance on recent many-core architectures. To reduce the communication latency, we propose a lowlatency router architecture that predicts an output channel being used by the next packet transfer and speculatively completes the switch arbitration. In the prediction routers, incoming packets are transferred without waiting the routing computation and switch arbitration if the prediction hits. Thus, the primary concern for reducing the communication latency is the hit rates of prediction algorithms, which vary from the network environments, such asthe network topology, routing algorithm, and traffic pattern. Although typical low-latency routers that speculatively skip one or more pipeline stages use a bypass datapath for specific packet transfers (e.g., packets moving on the same dimension), our prediction router predictively forwards packets based on a prediction algorithm selected from several candidates in response to the network environments. In this paper, we analyze the prediction hit rates of six prediction algorithms on meshes, tori, and fat trees. Then we provide three case studies, each of which assumes different many-core architecture. We have implemented a prediction router for each case study by using a 65nm CMOS process, and evaluated them in terms of the prediction hit rate, zero load latency, hardware amount, and energy consumption. The results show that although the area and energy are increased by 6.4-15.9% and 8.0-9.5% respectively, up to 89.8% of the prediction hit rate is achieved in real applications, which provide favorable trade-offs between the modest hardware/energy overheads and the latency saving.

AB - Network-on-Chips (NoCs) are quite latency sensitive, since their communication latency strongly affects the application performance on recent many-core architectures. To reduce the communication latency, we propose a lowlatency router architecture that predicts an output channel being used by the next packet transfer and speculatively completes the switch arbitration. In the prediction routers, incoming packets are transferred without waiting the routing computation and switch arbitration if the prediction hits. Thus, the primary concern for reducing the communication latency is the hit rates of prediction algorithms, which vary from the network environments, such asthe network topology, routing algorithm, and traffic pattern. Although typical low-latency routers that speculatively skip one or more pipeline stages use a bypass datapath for specific packet transfers (e.g., packets moving on the same dimension), our prediction router predictively forwards packets based on a prediction algorithm selected from several candidates in response to the network environments. In this paper, we analyze the prediction hit rates of six prediction algorithms on meshes, tori, and fat trees. Then we provide three case studies, each of which assumes different many-core architecture. We have implemented a prediction router for each case study by using a 65nm CMOS process, and evaluated them in terms of the prediction hit rate, zero load latency, hardware amount, and energy consumption. The results show that although the area and energy are increased by 6.4-15.9% and 8.0-9.5% respectively, up to 89.8% of the prediction hit rate is achieved in real applications, which provide favorable trade-offs between the modest hardware/energy overheads and the latency saving.

UR - http://www.scopus.com/inward/record.url?scp=64949183988&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=64949183988&partnerID=8YFLogxK

U2 - 10.1109/HPCA.2009.4798274

DO - 10.1109/HPCA.2009.4798274

M3 - Conference contribution

AN - SCOPUS:64949183988

SN - 9781424429325

SP - 367

EP - 378

BT - Proceedings - International Symposium on High-Performance Computer Architecture

ER -