Accelerating spark RDD operations with local and remote GPU devices

Yasuhiro Ohno, Shin Morishima, Hiroki Matsutani

Research output: Chapter in Book/Report/Conference proceedingConference contribution

8 Citations (Scopus)

Abstract

Apache Spark is a distributed processing framework for large-scale data sets, where intermediate data sets are represented as RDDs (Resilient Distributed Datasets) and stored in memory distributed over machines. To accelerate its various computation intensive operations, such as reduction and sort, we focus on GPU devices. We modified Spark framework to invoke CUDA kernels when computation intensive operations are called. RDDs are transformed into array structures and transferred to GPU devices when necessary. Although we need to cache RDDs in GPU device memory as much as possible in order to hide the data transfer overhead, the number of local GPU devices mounted in a host machine is limited. In this paper, we propose to use remote GPU devices which are connected to a host machine via a PCI-Express over 10Gbps Ethernet technology. To mitigate the data transfer overhead for remote GPU devices, we propose three RDD caching policies for local and remote GPU devices. We implemented various reduction programs (e.g., Sum, Max, LineCount) and transformation programs (e.g., SortByKey, PatternMatch, WordConversion) using local and remote GPU devices for Spark. Evaluation results show that Spark with GPU outperforms the original software by up to 21.4x. We also evaluate the RDD caching policies for local and remote GPU devices and show that a caching policy that minimizes the data transfer amount for remote GPU devices achieves the best performance.

Original languageEnglish
Title of host publicationProceedings - 22nd IEEE International Conference on Parallel and Distributed Systems, ICPADS 2016
PublisherIEEE Computer Society
Pages791-799
Number of pages9
ISBN (Electronic)9781509044573
DOIs
Publication statusPublished - 2017 Jan 18
Event22nd IEEE International Conference on Parallel and Distributed Systems, ICPADS 2016 - Wuhan, Hubei, China
Duration: 2016 Dec 132016 Dec 16

Other

Other22nd IEEE International Conference on Parallel and Distributed Systems, ICPADS 2016
CountryChina
CityWuhan, Hubei
Period16/12/1316/12/16

Fingerprint

Electric sparks
Data transfer
Graphics processing unit
Data storage equipment
Ethernet

Keywords

  • Apache Spark
  • CUDA
  • GPU
  • PCIe over 10GbE
  • RDD

ASJC Scopus subject areas

  • Hardware and Architecture

Cite this

Ohno, Y., Morishima, S., & Matsutani, H. (2017). Accelerating spark RDD operations with local and remote GPU devices. In Proceedings - 22nd IEEE International Conference on Parallel and Distributed Systems, ICPADS 2016 (pp. 791-799). [7823823] IEEE Computer Society. https://doi.org/10.1109/ICPADS.2016.0108

Accelerating spark RDD operations with local and remote GPU devices. / Ohno, Yasuhiro; Morishima, Shin; Matsutani, Hiroki.

Proceedings - 22nd IEEE International Conference on Parallel and Distributed Systems, ICPADS 2016. IEEE Computer Society, 2017. p. 791-799 7823823.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Ohno, Y, Morishima, S & Matsutani, H 2017, Accelerating spark RDD operations with local and remote GPU devices. in Proceedings - 22nd IEEE International Conference on Parallel and Distributed Systems, ICPADS 2016., 7823823, IEEE Computer Society, pp. 791-799, 22nd IEEE International Conference on Parallel and Distributed Systems, ICPADS 2016, Wuhan, Hubei, China, 16/12/13. https://doi.org/10.1109/ICPADS.2016.0108
Ohno Y, Morishima S, Matsutani H. Accelerating spark RDD operations with local and remote GPU devices. In Proceedings - 22nd IEEE International Conference on Parallel and Distributed Systems, ICPADS 2016. IEEE Computer Society. 2017. p. 791-799. 7823823 https://doi.org/10.1109/ICPADS.2016.0108
Ohno, Yasuhiro ; Morishima, Shin ; Matsutani, Hiroki. / Accelerating spark RDD operations with local and remote GPU devices. Proceedings - 22nd IEEE International Conference on Parallel and Distributed Systems, ICPADS 2016. IEEE Computer Society, 2017. pp. 791-799
@inproceedings{3b4000dd8a34468fbc10d9c0061842aa,
title = "Accelerating spark RDD operations with local and remote GPU devices",
abstract = "Apache Spark is a distributed processing framework for large-scale data sets, where intermediate data sets are represented as RDDs (Resilient Distributed Datasets) and stored in memory distributed over machines. To accelerate its various computation intensive operations, such as reduction and sort, we focus on GPU devices. We modified Spark framework to invoke CUDA kernels when computation intensive operations are called. RDDs are transformed into array structures and transferred to GPU devices when necessary. Although we need to cache RDDs in GPU device memory as much as possible in order to hide the data transfer overhead, the number of local GPU devices mounted in a host machine is limited. In this paper, we propose to use remote GPU devices which are connected to a host machine via a PCI-Express over 10Gbps Ethernet technology. To mitigate the data transfer overhead for remote GPU devices, we propose three RDD caching policies for local and remote GPU devices. We implemented various reduction programs (e.g., Sum, Max, LineCount) and transformation programs (e.g., SortByKey, PatternMatch, WordConversion) using local and remote GPU devices for Spark. Evaluation results show that Spark with GPU outperforms the original software by up to 21.4x. We also evaluate the RDD caching policies for local and remote GPU devices and show that a caching policy that minimizes the data transfer amount for remote GPU devices achieves the best performance.",
keywords = "Apache Spark, CUDA, GPU, PCIe over 10GbE, RDD",
author = "Yasuhiro Ohno and Shin Morishima and Hiroki Matsutani",
year = "2017",
month = "1",
day = "18",
doi = "10.1109/ICPADS.2016.0108",
language = "English",
pages = "791--799",
booktitle = "Proceedings - 22nd IEEE International Conference on Parallel and Distributed Systems, ICPADS 2016",
publisher = "IEEE Computer Society",

}

TY - GEN

T1 - Accelerating spark RDD operations with local and remote GPU devices

AU - Ohno, Yasuhiro

AU - Morishima, Shin

AU - Matsutani, Hiroki

PY - 2017/1/18

Y1 - 2017/1/18

N2 - Apache Spark is a distributed processing framework for large-scale data sets, where intermediate data sets are represented as RDDs (Resilient Distributed Datasets) and stored in memory distributed over machines. To accelerate its various computation intensive operations, such as reduction and sort, we focus on GPU devices. We modified Spark framework to invoke CUDA kernels when computation intensive operations are called. RDDs are transformed into array structures and transferred to GPU devices when necessary. Although we need to cache RDDs in GPU device memory as much as possible in order to hide the data transfer overhead, the number of local GPU devices mounted in a host machine is limited. In this paper, we propose to use remote GPU devices which are connected to a host machine via a PCI-Express over 10Gbps Ethernet technology. To mitigate the data transfer overhead for remote GPU devices, we propose three RDD caching policies for local and remote GPU devices. We implemented various reduction programs (e.g., Sum, Max, LineCount) and transformation programs (e.g., SortByKey, PatternMatch, WordConversion) using local and remote GPU devices for Spark. Evaluation results show that Spark with GPU outperforms the original software by up to 21.4x. We also evaluate the RDD caching policies for local and remote GPU devices and show that a caching policy that minimizes the data transfer amount for remote GPU devices achieves the best performance.

AB - Apache Spark is a distributed processing framework for large-scale data sets, where intermediate data sets are represented as RDDs (Resilient Distributed Datasets) and stored in memory distributed over machines. To accelerate its various computation intensive operations, such as reduction and sort, we focus on GPU devices. We modified Spark framework to invoke CUDA kernels when computation intensive operations are called. RDDs are transformed into array structures and transferred to GPU devices when necessary. Although we need to cache RDDs in GPU device memory as much as possible in order to hide the data transfer overhead, the number of local GPU devices mounted in a host machine is limited. In this paper, we propose to use remote GPU devices which are connected to a host machine via a PCI-Express over 10Gbps Ethernet technology. To mitigate the data transfer overhead for remote GPU devices, we propose three RDD caching policies for local and remote GPU devices. We implemented various reduction programs (e.g., Sum, Max, LineCount) and transformation programs (e.g., SortByKey, PatternMatch, WordConversion) using local and remote GPU devices for Spark. Evaluation results show that Spark with GPU outperforms the original software by up to 21.4x. We also evaluate the RDD caching policies for local and remote GPU devices and show that a caching policy that minimizes the data transfer amount for remote GPU devices achieves the best performance.

KW - Apache Spark

KW - CUDA

KW - GPU

KW - PCIe over 10GbE

KW - RDD

UR - http://www.scopus.com/inward/record.url?scp=85018520965&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85018520965&partnerID=8YFLogxK

U2 - 10.1109/ICPADS.2016.0108

DO - 10.1109/ICPADS.2016.0108

M3 - Conference contribution

AN - SCOPUS:85018520965

SP - 791

EP - 799

BT - Proceedings - 22nd IEEE International Conference on Parallel and Distributed Systems, ICPADS 2016

PB - IEEE Computer Society

ER -