TY - GEN
T1 - Phase-based reboot
T2 - 2011 IEEE/IFIP 41st International Conference on Dependable Systems and Networks, DSN 2011
AU - Yamakita, Kazuya
AU - Yamada, Hiroshi
AU - Kono, Kenji
N1 - Copyright:
Copyright 2011 Elsevier B.V., All rights reserved.
PY - 2011
Y1 - 2011
N2 - Although operating systems (OSes) are crucial to achieving high availability of computer systems, modern OSes are far from bug-free. Rebooting the OS is simple, powerful, and sometimes the only remedy for kernel failures. Once we accept reboot-based recovery as a fact of life, we should try to ensure that the downtime caused by reboots is as short as possible. This paper presents phase-based reboots that shorten the downtime caused by reboot-based recovery. The key idea is to divide a boot sequence into phases. The phase-based reboot reuses a system state in the previous boot if the next boot reproduces the same state. A prototype of the phase-based reboot was implemented on Xen 3.4.1 running para-virtualized Linux 2.6.18. Experiments with the prototype show that it successfully recovered from kernel transient failures inserted by a fault injector, and its downtime was 34.3 to 93.6% shorter than that of the normal reboot-based recovery.
AB - Although operating systems (OSes) are crucial to achieving high availability of computer systems, modern OSes are far from bug-free. Rebooting the OS is simple, powerful, and sometimes the only remedy for kernel failures. Once we accept reboot-based recovery as a fact of life, we should try to ensure that the downtime caused by reboots is as short as possible. This paper presents phase-based reboots that shorten the downtime caused by reboot-based recovery. The key idea is to divide a boot sequence into phases. The phase-based reboot reuses a system state in the previous boot if the next boot reproduces the same state. A prototype of the phase-based reboot was implemented on Xen 3.4.1 running para-virtualized Linux 2.6.18. Experiments with the prototype show that it successfully recovered from kernel transient failures inserted by a fault injector, and its downtime was 34.3 to 93.6% shorter than that of the normal reboot-based recovery.
KW - Operating System Reliability
KW - Reboot-based Recovery
KW - Virtualization
UR - http://www.scopus.com/inward/record.url?scp=80051948356&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=80051948356&partnerID=8YFLogxK
U2 - 10.1109/DSN.2011.5958216
DO - 10.1109/DSN.2011.5958216
M3 - Conference contribution
AN - SCOPUS:80051948356
SN - 9781424492336
T3 - Proceedings of the International Conference on Dependable Systems and Networks
SP - 169
EP - 180
BT - 2011 IEEE/IFIP 41st International Conference on Dependable Systems and Networks, DSN 2011
Y2 - 27 June 2011 through 30 June 2011
ER -