Cooling-Aware Job Scheduling and Node Allocation for Overprovisioned HPC Systems

Thang Cao, Wei Huang, Yuan He, Masaaki Kondo

Research output: Chapter in Book/Report/Conference proceedingConference contribution

21 Citations (Scopus)

Abstract

Limited power budget is becoming one of the most crucial challenges in developing supercomputer systems. Hardware overprovisioning which installs a larger number of nodes beyond the limitations of the power constraint is an attractive way to design next generation supercomputers. In air cooled HPC centers, about half of the total power is consumed by cooling facilities. Reducing cooling power and effectively utilizing power resource for computing nodes are important challenges. It is known that the cooling power depends on the hotspot temperature of the node inlets. Therefore, if we minimize the hotspot temperature, performance efficiency of the HPC system will be increased. One of the ways to reduce the hotspot temperature is to allocate power-hungry jobs to compute nodes whose effect on the hotspot temperature is small. It can be accomplished by optimizing job-to-node mapping in the job scheduler. In this paper, we propose a cooling and node location-aware job scheduling strategy which tries to optimize job-to-node mapping while improving the total system throughput under the constraint of total system (compute nodes and cooling facilities) power consumption. Experimental results with the job scheduling simulation show that our scheduling scheme achieves 1.49X higher total system throughput than the conventional scheme.

Original languageEnglish
Title of host publicationProceedings - 2017 IEEE 31st International Parallel and Distributed Processing Symposium, IPDPS 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages728-737
Number of pages10
ISBN (Electronic)9781538639146
DOIs
Publication statusPublished - 2017 Jun 30
Externally publishedYes
Event31st IEEE International Parallel and Distributed Processing Symposium, IPDPS 2017 - Orlando, United States
Duration: 2017 May 292017 Jun 2

Publication series

NameProceedings - 2017 IEEE 31st International Parallel and Distributed Processing Symposium, IPDPS 2017

Conference

Conference31st IEEE International Parallel and Distributed Processing Symposium, IPDPS 2017
Country/TerritoryUnited States
CityOrlando
Period17/5/2917/6/2

Keywords

  • Cooling-Aware
  • Hardware Overprovisioned System
  • Parallel Job Scheduling
  • Power-Constrained HPC System

ASJC Scopus subject areas

  • Information Systems
  • Computer Networks and Communications
  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'Cooling-Aware Job Scheduling and Node Allocation for Overprovisioned HPC Systems'. Together they form a unique fingerprint.

Cite this