A lightweight fault-tolerant mechanism for network-on-chip

Michihiro Koibuchi, Hiroki Matsutani, Hideharu Amano, Timothy Mark Pinkston

Research output: Chapter in Book/Report/Conference proceedingConference contribution

104 Citations (Scopus)

Abstract

Survival capability is becoming a crucial factor in designing multicore processors built with on-chip packet networks, or networks on chip (NoCs). In this paper, we propose a lightweight fault-tolerant mechanism for NoCs based on default backup paths (DBPs) designed to maintain, in the presence of failures, network connectivity of both nonfaulty routers as well as healthy processor cores which may be connected to faulty routers. The mechanism provides default paths as backup between certain router ports which serve as alternative datapaths to circumvent failed components within a faulty router. Along with a minimal subset of normal network channels, the set of default backup paths internal to faulty routers form - in the worst case - a unidirectional ring topology that provides network-wide connectivity to all processor cores. Routing using the DBP mechanism is proved to be deadlock-free with only two virtual channels even for fault scenarios in which regular networks degrade to irregular (arbitrary) topologies. Evaluation results show that, for a 2-D mesh wormhole NoC, only 12.6% additional hardware resources are needed to implement the proposed DBP mechanism in order to provide graceful performance degradation without chip-wide failure as the number of faults increases to the maximum needed to form ring.

Original languageEnglish
Title of host publicationProceedings - Second IEEE International Symposium on Networks-on-Chip, NOCS 2008
Pages13-22
Number of pages10
DOIs
Publication statusPublished - 2008
Event2nd IEEE International Symposium on Networks-on-Chip, NOCS 2008 - Newcastle upon Tyne, United Kingdom
Duration: 2008 Apr 72008 Apr 11

Other

Other2nd IEEE International Symposium on Networks-on-Chip, NOCS 2008
CountryUnited Kingdom
CityNewcastle upon Tyne
Period08/4/708/4/11

Fingerprint

Routers
Topology
Packet networks
Network-on-chip
Hardware
Degradation

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

Koibuchi, M., Matsutani, H., Amano, H., & Pinkston, T. M. (2008). A lightweight fault-tolerant mechanism for network-on-chip. In Proceedings - Second IEEE International Symposium on Networks-on-Chip, NOCS 2008 (pp. 13-22). [4492721] https://doi.org/10.1109/NOCS.2008.4492721

A lightweight fault-tolerant mechanism for network-on-chip. / Koibuchi, Michihiro; Matsutani, Hiroki; Amano, Hideharu; Pinkston, Timothy Mark.

Proceedings - Second IEEE International Symposium on Networks-on-Chip, NOCS 2008. 2008. p. 13-22 4492721.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Koibuchi, M, Matsutani, H, Amano, H & Pinkston, TM 2008, A lightweight fault-tolerant mechanism for network-on-chip. in Proceedings - Second IEEE International Symposium on Networks-on-Chip, NOCS 2008., 4492721, pp. 13-22, 2nd IEEE International Symposium on Networks-on-Chip, NOCS 2008, Newcastle upon Tyne, United Kingdom, 08/4/7. https://doi.org/10.1109/NOCS.2008.4492721
Koibuchi M, Matsutani H, Amano H, Pinkston TM. A lightweight fault-tolerant mechanism for network-on-chip. In Proceedings - Second IEEE International Symposium on Networks-on-Chip, NOCS 2008. 2008. p. 13-22. 4492721 https://doi.org/10.1109/NOCS.2008.4492721
Koibuchi, Michihiro ; Matsutani, Hiroki ; Amano, Hideharu ; Pinkston, Timothy Mark. / A lightweight fault-tolerant mechanism for network-on-chip. Proceedings - Second IEEE International Symposium on Networks-on-Chip, NOCS 2008. 2008. pp. 13-22
@inproceedings{f7e8f982646d41cf9f709125cd31d9f4,
title = "A lightweight fault-tolerant mechanism for network-on-chip",
abstract = "Survival capability is becoming a crucial factor in designing multicore processors built with on-chip packet networks, or networks on chip (NoCs). In this paper, we propose a lightweight fault-tolerant mechanism for NoCs based on default backup paths (DBPs) designed to maintain, in the presence of failures, network connectivity of both nonfaulty routers as well as healthy processor cores which may be connected to faulty routers. The mechanism provides default paths as backup between certain router ports which serve as alternative datapaths to circumvent failed components within a faulty router. Along with a minimal subset of normal network channels, the set of default backup paths internal to faulty routers form - in the worst case - a unidirectional ring topology that provides network-wide connectivity to all processor cores. Routing using the DBP mechanism is proved to be deadlock-free with only two virtual channels even for fault scenarios in which regular networks degrade to irregular (arbitrary) topologies. Evaluation results show that, for a 2-D mesh wormhole NoC, only 12.6{\%} additional hardware resources are needed to implement the proposed DBP mechanism in order to provide graceful performance degradation without chip-wide failure as the number of faults increases to the maximum needed to form ring.",
author = "Michihiro Koibuchi and Hiroki Matsutani and Hideharu Amano and Pinkston, {Timothy Mark}",
year = "2008",
doi = "10.1109/NOCS.2008.4492721",
language = "English",
isbn = "0769530982",
pages = "13--22",
booktitle = "Proceedings - Second IEEE International Symposium on Networks-on-Chip, NOCS 2008",

}

TY - GEN

T1 - A lightweight fault-tolerant mechanism for network-on-chip

AU - Koibuchi, Michihiro

AU - Matsutani, Hiroki

AU - Amano, Hideharu

AU - Pinkston, Timothy Mark

PY - 2008

Y1 - 2008

N2 - Survival capability is becoming a crucial factor in designing multicore processors built with on-chip packet networks, or networks on chip (NoCs). In this paper, we propose a lightweight fault-tolerant mechanism for NoCs based on default backup paths (DBPs) designed to maintain, in the presence of failures, network connectivity of both nonfaulty routers as well as healthy processor cores which may be connected to faulty routers. The mechanism provides default paths as backup between certain router ports which serve as alternative datapaths to circumvent failed components within a faulty router. Along with a minimal subset of normal network channels, the set of default backup paths internal to faulty routers form - in the worst case - a unidirectional ring topology that provides network-wide connectivity to all processor cores. Routing using the DBP mechanism is proved to be deadlock-free with only two virtual channels even for fault scenarios in which regular networks degrade to irregular (arbitrary) topologies. Evaluation results show that, for a 2-D mesh wormhole NoC, only 12.6% additional hardware resources are needed to implement the proposed DBP mechanism in order to provide graceful performance degradation without chip-wide failure as the number of faults increases to the maximum needed to form ring.

AB - Survival capability is becoming a crucial factor in designing multicore processors built with on-chip packet networks, or networks on chip (NoCs). In this paper, we propose a lightweight fault-tolerant mechanism for NoCs based on default backup paths (DBPs) designed to maintain, in the presence of failures, network connectivity of both nonfaulty routers as well as healthy processor cores which may be connected to faulty routers. The mechanism provides default paths as backup between certain router ports which serve as alternative datapaths to circumvent failed components within a faulty router. Along with a minimal subset of normal network channels, the set of default backup paths internal to faulty routers form - in the worst case - a unidirectional ring topology that provides network-wide connectivity to all processor cores. Routing using the DBP mechanism is proved to be deadlock-free with only two virtual channels even for fault scenarios in which regular networks degrade to irregular (arbitrary) topologies. Evaluation results show that, for a 2-D mesh wormhole NoC, only 12.6% additional hardware resources are needed to implement the proposed DBP mechanism in order to provide graceful performance degradation without chip-wide failure as the number of faults increases to the maximum needed to form ring.

UR - http://www.scopus.com/inward/record.url?scp=44149126468&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=44149126468&partnerID=8YFLogxK

U2 - 10.1109/NOCS.2008.4492721

DO - 10.1109/NOCS.2008.4492721

M3 - Conference contribution

SN - 0769530982

SN - 9780769530987

SP - 13

EP - 22

BT - Proceedings - Second IEEE International Symposium on Networks-on-Chip, NOCS 2008

ER -