TY - GEN
T1 - Demand-Aware Power Management for Power-Constrained HPC Systems
AU - Cao, Thang
AU - He, Yuan
AU - Kondo, Masaaki
N1 - Funding Information:
This work is supported by the Japan Science and Technology Agency (JST) CREST program for the research project named Power Management Framework for Post-Petascale Supercomputers. We would like to thank all of the project members for their valuable comments. We are also grateful to the Research Institute for Information Technology of Kyushu University for providing us the resources and support to conduct the large-scale power measurements.
Publisher Copyright:
© 2016 IEEE.
PY - 2016/7/18
Y1 - 2016/7/18
N2 - As limited power budget is becoming one of the most crucialchallenges in developing supercomputer systems, hardware overprovisioning which installs larger number of nodes beyond the limitations of the power constraint determinedby Thermal Design Power is an attractive way to design extreme-scale supercomputers. In this design, power consumption of each node should be controlled by power-knobs equipped in the hardware such as dynamic voltage and frequency scaling (DVFS) or power capping mechanisms. Traditionally, in supercomputer systems, schedulers determine when and where to allocate jobs. In overprovisioned systems, the schedulers also need to care about power allocation to each job. An easy way is to set a fixed power cap for each job so that the total power consumption is within the power constraint of the system. This fixed power capping does not necessarily provide good performance since the effective power usage of jobs changes throughout their execution. Moreover, because each job has its own performance requirement, fixed power cap may not work well for all the jobs. In this paper, we propose a demand-aware power management framework for overprovisioned and power-constrained high-performance computing (HPC) systems. The job scheduler selects a job to run based on available hardware and power resources. The power manager continuously monitors power usage, predicts performance of executing jobs and optimizes power cap of each CPU so that the required performance level of each job is satisfied while improving system throughput by making good use of available powerbudget. Experiments on a real HPC system and with simulation for a large scale system show that the power manager can successfully control power consumption of executing jobs while achieving 1.17x improvement in system throughput.
AB - As limited power budget is becoming one of the most crucialchallenges in developing supercomputer systems, hardware overprovisioning which installs larger number of nodes beyond the limitations of the power constraint determinedby Thermal Design Power is an attractive way to design extreme-scale supercomputers. In this design, power consumption of each node should be controlled by power-knobs equipped in the hardware such as dynamic voltage and frequency scaling (DVFS) or power capping mechanisms. Traditionally, in supercomputer systems, schedulers determine when and where to allocate jobs. In overprovisioned systems, the schedulers also need to care about power allocation to each job. An easy way is to set a fixed power cap for each job so that the total power consumption is within the power constraint of the system. This fixed power capping does not necessarily provide good performance since the effective power usage of jobs changes throughout their execution. Moreover, because each job has its own performance requirement, fixed power cap may not work well for all the jobs. In this paper, we propose a demand-aware power management framework for overprovisioned and power-constrained high-performance computing (HPC) systems. The job scheduler selects a job to run based on available hardware and power resources. The power manager continuously monitors power usage, predicts performance of executing jobs and optimizes power cap of each CPU so that the required performance level of each job is satisfied while improving system throughput by making good use of available powerbudget. Experiments on a real HPC system and with simulation for a large scale system show that the power manager can successfully control power consumption of executing jobs while achieving 1.17x improvement in system throughput.
KW - Adaptive Power Management
KW - Extreme-Scale Computing
KW - Hardware Overprovisioned System
KW - HPC System
KW - Job Scheduling
UR - http://www.scopus.com/inward/record.url?scp=84983468111&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84983468111&partnerID=8YFLogxK
U2 - 10.1109/CCGrid.2016.25
DO - 10.1109/CCGrid.2016.25
M3 - Conference contribution
AN - SCOPUS:84983468111
T3 - Proceedings - 2016 16th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2016
SP - 21
EP - 31
BT - Proceedings - 2016 16th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2016
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 16th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2016
Y2 - 16 May 2016 through 19 May 2016
ER -