Skew-Aware Collective Communication for MapReduce Shuffling

Harunobu Daikoku, Hideyuki Kawashima, Osamu Tatebe

研究成果: Conference contribution


This paper proposes and examines the three in-memory shuffling methods designed to address problems in MapReduce shuffling caused by skewed data. Coupled Shuffle Architecture (CSA) employs a single pairwise all-to-all exchange to shuffle both blocks, units of shuffle transfer, and meta-blocks, which contain the metadata of corresponding blocks. Decoupled Shuffle Architecture (DSA) separates the shuffling of meta-blocks and blocks, and applies different all-to-all exchange algorithms to each shuffling process, attempting to mitigate the impact of stragglers in strongly skewed distributions. Decoupled Shuffle Architecture with Skew-Aware Meta-Shuffle (DSA w/ SMS) autonomously determines the proper placement of blocks based on the memory consumption of each worker process. This approach targets extremely skewed situations where some worker processes could exceed their node memory limitation. This study evaluates implementations of the three shuffling methods in our prototype in-memory MapReduce engine, which employs high performance interconnects such as InfiniBand and Intel Omni-Path. Our results suggest that DSA w/ SMS is the only viable solution for extremely skewed data distributions, but this solution is only valid on systems equipped with high performance interconnects. We also present a detailed investigation of the performance of CSA and DSA in various skew situations.

ホスト出版物のタイトルProceedings - 2018 IEEE International Conference on Big Data, Big Data 2018
編集者Yang Song, Bing Liu, Kisung Lee, Naoki Abe, Calton Pu, Mu Qiao, Nesreen Ahmed, Donald Kossmann, Jeffrey Saltz, Jiliang Tang, Jingrui He, Huan Liu, Xiaohua Hu
出版社Institute of Electrical and Electronics Engineers Inc.
出版ステータスPublished - 2019 1 22
イベント2018 IEEE International Conference on Big Data, Big Data 2018 - Seattle, United States
継続期間: 2018 12 102018 12 13


名前Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018


Conference2018 IEEE International Conference on Big Data, Big Data 2018
国/地域United States

ASJC Scopus subject areas

  • コンピュータ サイエンスの応用
  • 情報システム


「Skew-Aware Collective Communication for MapReduce Shuffling」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。