TY - GEN
T1 - Cooling-Aware Job Scheduling and Node Allocation for Overprovisioned HPC Systems
AU - Cao, Thang
AU - Huang, Wei
AU - He, Yuan
AU - Kondo, Masaaki
N1 - Funding Information:
ACKNOWLEDGMENT This work is supported by the Japan Science and Technology Agency (JST) CREST program for the research project named Power Management Framework for Post-Petascale Supercomputers. We would like to thank all of the project members for their valuable comments. We are also thankful to the Research Institute for Information Technology of Kyushu University for helping us to conduct experiments in the real small HPC system. REFERENCES
Publisher Copyright:
© 2017 IEEE.
PY - 2017/6/30
Y1 - 2017/6/30
N2 - Limited power budget is becoming one of the most crucial challenges in developing supercomputer systems. Hardware overprovisioning which installs a larger number of nodes beyond the limitations of the power constraint is an attractive way to design next generation supercomputers. In air cooled HPC centers, about half of the total power is consumed by cooling facilities. Reducing cooling power and effectively utilizing power resource for computing nodes are important challenges. It is known that the cooling power depends on the hotspot temperature of the node inlets. Therefore, if we minimize the hotspot temperature, performance efficiency of the HPC system will be increased. One of the ways to reduce the hotspot temperature is to allocate power-hungry jobs to compute nodes whose effect on the hotspot temperature is small. It can be accomplished by optimizing job-to-node mapping in the job scheduler. In this paper, we propose a cooling and node location-aware job scheduling strategy which tries to optimize job-to-node mapping while improving the total system throughput under the constraint of total system (compute nodes and cooling facilities) power consumption. Experimental results with the job scheduling simulation show that our scheduling scheme achieves 1.49X higher total system throughput than the conventional scheme.
AB - Limited power budget is becoming one of the most crucial challenges in developing supercomputer systems. Hardware overprovisioning which installs a larger number of nodes beyond the limitations of the power constraint is an attractive way to design next generation supercomputers. In air cooled HPC centers, about half of the total power is consumed by cooling facilities. Reducing cooling power and effectively utilizing power resource for computing nodes are important challenges. It is known that the cooling power depends on the hotspot temperature of the node inlets. Therefore, if we minimize the hotspot temperature, performance efficiency of the HPC system will be increased. One of the ways to reduce the hotspot temperature is to allocate power-hungry jobs to compute nodes whose effect on the hotspot temperature is small. It can be accomplished by optimizing job-to-node mapping in the job scheduler. In this paper, we propose a cooling and node location-aware job scheduling strategy which tries to optimize job-to-node mapping while improving the total system throughput under the constraint of total system (compute nodes and cooling facilities) power consumption. Experimental results with the job scheduling simulation show that our scheduling scheme achieves 1.49X higher total system throughput than the conventional scheme.
KW - Cooling-Aware
KW - Hardware Overprovisioned System
KW - Parallel Job Scheduling
KW - Power-Constrained HPC System
UR - http://www.scopus.com/inward/record.url?scp=85027694533&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85027694533&partnerID=8YFLogxK
U2 - 10.1109/IPDPS.2017.19
DO - 10.1109/IPDPS.2017.19
M3 - Conference contribution
AN - SCOPUS:85027694533
T3 - Proceedings - 2017 IEEE 31st International Parallel and Distributed Processing Symposium, IPDPS 2017
SP - 728
EP - 737
BT - Proceedings - 2017 IEEE 31st International Parallel and Distributed Processing Symposium, IPDPS 2017
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 31st IEEE International Parallel and Distributed Processing Symposium, IPDPS 2017
Y2 - 29 May 2017 through 2 June 2017
ER -