Phase-based reboot: Reusing operating system execution phases for cheap reboot-based recovery

Kazuya Yamakita, Hiroshi Yamada, Kenji Kono

Research output: Chapter in Book/Report/Conference proceedingConference contribution

22 Citations (Scopus)

Abstract

Although operating systems (OSes) are crucial to achieving high availability of computer systems, modern OSes are far from bug-free. Rebooting the OS is simple, powerful, and sometimes the only remedy for kernel failures. Once we accept reboot-based recovery as a fact of life, we should try to ensure that the downtime caused by reboots is as short as possible. This paper presents phase-based reboots that shorten the downtime caused by reboot-based recovery. The key idea is to divide a boot sequence into phases. The phase-based reboot reuses a system state in the previous boot if the next boot reproduces the same state. A prototype of the phase-based reboot was implemented on Xen 3.4.1 running para-virtualized Linux 2.6.18. Experiments with the prototype show that it successfully recovered from kernel transient failures inserted by a fault injector, and its downtime was 34.3 to 93.6% shorter than that of the normal reboot-based recovery.

Original languageEnglish
Title of host publicationProceedings of the International Conference on Dependable Systems and Networks
Pages169-180
Number of pages12
DOIs
Publication statusPublished - 2011
Event2011 IEEE/IFIP 41st International Conference on Dependable Systems and Networks, DSN 2011 - Hong Kong, Hong Kong
Duration: 2011 Jun 272011 Jun 30

Other

Other2011 IEEE/IFIP 41st International Conference on Dependable Systems and Networks, DSN 2011
CountryHong Kong
CityHong Kong
Period11/6/2711/6/30

Fingerprint

Recovery
Computer systems
Availability
Experiments
Linux

Keywords

  • Operating System Reliability
  • Reboot-based Recovery
  • Virtualization

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Hardware and Architecture
  • Software

Cite this

Yamakita, K., Yamada, H., & Kono, K. (2011). Phase-based reboot: Reusing operating system execution phases for cheap reboot-based recovery. In Proceedings of the International Conference on Dependable Systems and Networks (pp. 169-180). [5958216] https://doi.org/10.1109/DSN.2011.5958216

Phase-based reboot : Reusing operating system execution phases for cheap reboot-based recovery. / Yamakita, Kazuya; Yamada, Hiroshi; Kono, Kenji.

Proceedings of the International Conference on Dependable Systems and Networks. 2011. p. 169-180 5958216.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Yamakita, K, Yamada, H & Kono, K 2011, Phase-based reboot: Reusing operating system execution phases for cheap reboot-based recovery. in Proceedings of the International Conference on Dependable Systems and Networks., 5958216, pp. 169-180, 2011 IEEE/IFIP 41st International Conference on Dependable Systems and Networks, DSN 2011, Hong Kong, Hong Kong, 11/6/27. https://doi.org/10.1109/DSN.2011.5958216
Yamakita K, Yamada H, Kono K. Phase-based reboot: Reusing operating system execution phases for cheap reboot-based recovery. In Proceedings of the International Conference on Dependable Systems and Networks. 2011. p. 169-180. 5958216 https://doi.org/10.1109/DSN.2011.5958216
Yamakita, Kazuya ; Yamada, Hiroshi ; Kono, Kenji. / Phase-based reboot : Reusing operating system execution phases for cheap reboot-based recovery. Proceedings of the International Conference on Dependable Systems and Networks. 2011. pp. 169-180
@inproceedings{4cf149bc9d9b4fcaa46db33844ab3b3f,
title = "Phase-based reboot: Reusing operating system execution phases for cheap reboot-based recovery",
abstract = "Although operating systems (OSes) are crucial to achieving high availability of computer systems, modern OSes are far from bug-free. Rebooting the OS is simple, powerful, and sometimes the only remedy for kernel failures. Once we accept reboot-based recovery as a fact of life, we should try to ensure that the downtime caused by reboots is as short as possible. This paper presents phase-based reboots that shorten the downtime caused by reboot-based recovery. The key idea is to divide a boot sequence into phases. The phase-based reboot reuses a system state in the previous boot if the next boot reproduces the same state. A prototype of the phase-based reboot was implemented on Xen 3.4.1 running para-virtualized Linux 2.6.18. Experiments with the prototype show that it successfully recovered from kernel transient failures inserted by a fault injector, and its downtime was 34.3 to 93.6{\%} shorter than that of the normal reboot-based recovery.",
keywords = "Operating System Reliability, Reboot-based Recovery, Virtualization",
author = "Kazuya Yamakita and Hiroshi Yamada and Kenji Kono",
year = "2011",
doi = "10.1109/DSN.2011.5958216",
language = "English",
isbn = "9781424492336",
pages = "169--180",
booktitle = "Proceedings of the International Conference on Dependable Systems and Networks",

}

TY - GEN

T1 - Phase-based reboot

T2 - Reusing operating system execution phases for cheap reboot-based recovery

AU - Yamakita, Kazuya

AU - Yamada, Hiroshi

AU - Kono, Kenji

PY - 2011

Y1 - 2011

N2 - Although operating systems (OSes) are crucial to achieving high availability of computer systems, modern OSes are far from bug-free. Rebooting the OS is simple, powerful, and sometimes the only remedy for kernel failures. Once we accept reboot-based recovery as a fact of life, we should try to ensure that the downtime caused by reboots is as short as possible. This paper presents phase-based reboots that shorten the downtime caused by reboot-based recovery. The key idea is to divide a boot sequence into phases. The phase-based reboot reuses a system state in the previous boot if the next boot reproduces the same state. A prototype of the phase-based reboot was implemented on Xen 3.4.1 running para-virtualized Linux 2.6.18. Experiments with the prototype show that it successfully recovered from kernel transient failures inserted by a fault injector, and its downtime was 34.3 to 93.6% shorter than that of the normal reboot-based recovery.

AB - Although operating systems (OSes) are crucial to achieving high availability of computer systems, modern OSes are far from bug-free. Rebooting the OS is simple, powerful, and sometimes the only remedy for kernel failures. Once we accept reboot-based recovery as a fact of life, we should try to ensure that the downtime caused by reboots is as short as possible. This paper presents phase-based reboots that shorten the downtime caused by reboot-based recovery. The key idea is to divide a boot sequence into phases. The phase-based reboot reuses a system state in the previous boot if the next boot reproduces the same state. A prototype of the phase-based reboot was implemented on Xen 3.4.1 running para-virtualized Linux 2.6.18. Experiments with the prototype show that it successfully recovered from kernel transient failures inserted by a fault injector, and its downtime was 34.3 to 93.6% shorter than that of the normal reboot-based recovery.

KW - Operating System Reliability

KW - Reboot-based Recovery

KW - Virtualization

UR - http://www.scopus.com/inward/record.url?scp=80051948356&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=80051948356&partnerID=8YFLogxK

U2 - 10.1109/DSN.2011.5958216

DO - 10.1109/DSN.2011.5958216

M3 - Conference contribution

AN - SCOPUS:80051948356

SN - 9781424492336

SP - 169

EP - 180

BT - Proceedings of the International Conference on Dependable Systems and Networks

ER -