Demand-Aware Power Management for Power-Constrained HPC Systems

Thang Cao, Yuan He, Masaaki Kondo

Research output: Chapter in Book/Report/Conference proceedingConference contribution

21 Citations (Scopus)

Abstract

As limited power budget is becoming one of the most crucialchallenges in developing supercomputer systems, hardware overprovisioning which installs larger number of nodes beyond the limitations of the power constraint determinedby Thermal Design Power is an attractive way to design extreme-scale supercomputers. In this design, power consumption of each node should be controlled by power-knobs equipped in the hardware such as dynamic voltage and frequency scaling (DVFS) or power capping mechanisms. Traditionally, in supercomputer systems, schedulers determine when and where to allocate jobs. In overprovisioned systems, the schedulers also need to care about power allocation to each job. An easy way is to set a fixed power cap for each job so that the total power consumption is within the power constraint of the system. This fixed power capping does not necessarily provide good performance since the effective power usage of jobs changes throughout their execution. Moreover, because each job has its own performance requirement, fixed power cap may not work well for all the jobs. In this paper, we propose a demand-aware power management framework for overprovisioned and power-constrained high-performance computing (HPC) systems. The job scheduler selects a job to run based on available hardware and power resources. The power manager continuously monitors power usage, predicts performance of executing jobs and optimizes power cap of each CPU so that the required performance level of each job is satisfied while improving system throughput by making good use of available powerbudget. Experiments on a real HPC system and with simulation for a large scale system show that the power manager can successfully control power consumption of executing jobs while achieving 1.17x improvement in system throughput.

Original languageEnglish
Title of host publicationProceedings - 2016 16th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages21-31
Number of pages11
ISBN (Electronic)9781509024520
DOIs
Publication statusPublished - 2016 Jul 18
Externally publishedYes
Event16th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2016 - Cartagena, Colombia
Duration: 2016 May 162016 May 19

Publication series

NameProceedings - 2016 16th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2016

Conference

Conference16th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2016
Country/TerritoryColombia
CityCartagena
Period16/5/1616/5/19

Keywords

  • Adaptive Power Management
  • Extreme-Scale Computing
  • Hardware Overprovisioned System
  • HPC System
  • Job Scheduling

ASJC Scopus subject areas

  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Demand-Aware Power Management for Power-Constrained HPC Systems'. Together they form a unique fingerprint.

Cite this