An FPGA-based low-latency network processing for spark streaming

Kohei Nakamura, Ami Hayashi, Hiroki Matsutani

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Low-latency stream data processing is a key enabler for on-line data analysis applications, such as detecting anomaly conditions and change points from stream data continuously generated from sensors and networking services. Existing stream processing frameworks are classified into micro-batch and one-at-a-time processing methodology. Apache Spark Streaming employs the micro-batch methodology, where data analysis is repeatedly performed for a series of data arrived during a short time period, called a micro batch. A rich set of data analysis libraries provided by Spark, such as machine learning and graph processing, can be applied for the micro batches. However, a drawback of the micro-batch processing methodology is a high latency for detecting anomaly conditions and change points. This is because data are accumulated in a micro batch (e.g., 1 sec length) and then data analysis is performed for the micro batch. In this paper, we propose to offload one-at-a-time methodology analysis functions on an FPGA-based 10Gbit Ethernet network interface card (FPGA NIC) in cooperation with Spark Streaming framework, in order to significantly reduce the processing latency and improve the processing throughput. We implemented word count and change-point detection applications on Spark Streaming with our FPGA NIC, where a one-at-a-time methodology analysis logic is implemented. Experiment results demonstrates that the word count throughput is improved by 22x and the change-point detection latency is reduced by 94.12% compared to the original Spark Streaming. Our approach can complement the existing micro-batch methodology data analysis framework with ultra low latency one-at-a-time methodology logic.

Original languageEnglish
Title of host publicationProceedings - 2016 IEEE International Conference on Big Data, Big Data 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages2410-2415
Number of pages6
ISBN (Electronic)9781467390040
DOIs
Publication statusPublished - 2017 Feb 2
Event4th IEEE International Conference on Big Data, Big Data 2016 - Washington, United States
Duration: 2016 Dec 52016 Dec 8

Other

Other4th IEEE International Conference on Big Data, Big Data 2016
CountryUnited States
CityWashington
Period16/12/516/12/8

Fingerprint

Electric sparks
Field programmable gate arrays (FPGA)
Processing
Ethernet
Interfaces (computer)
Throughput
Learning systems
Sensors
Experiments

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems
  • Hardware and Architecture

Cite this

Nakamura, K., Hayashi, A., & Matsutani, H. (2017). An FPGA-based low-latency network processing for spark streaming. In Proceedings - 2016 IEEE International Conference on Big Data, Big Data 2016 (pp. 2410-2415). [7840876] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/BigData.2016.7840876

An FPGA-based low-latency network processing for spark streaming. / Nakamura, Kohei; Hayashi, Ami; Matsutani, Hiroki.

Proceedings - 2016 IEEE International Conference on Big Data, Big Data 2016. Institute of Electrical and Electronics Engineers Inc., 2017. p. 2410-2415 7840876.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Nakamura, K, Hayashi, A & Matsutani, H 2017, An FPGA-based low-latency network processing for spark streaming. in Proceedings - 2016 IEEE International Conference on Big Data, Big Data 2016., 7840876, Institute of Electrical and Electronics Engineers Inc., pp. 2410-2415, 4th IEEE International Conference on Big Data, Big Data 2016, Washington, United States, 16/12/5. https://doi.org/10.1109/BigData.2016.7840876
Nakamura K, Hayashi A, Matsutani H. An FPGA-based low-latency network processing for spark streaming. In Proceedings - 2016 IEEE International Conference on Big Data, Big Data 2016. Institute of Electrical and Electronics Engineers Inc. 2017. p. 2410-2415. 7840876 https://doi.org/10.1109/BigData.2016.7840876
Nakamura, Kohei ; Hayashi, Ami ; Matsutani, Hiroki. / An FPGA-based low-latency network processing for spark streaming. Proceedings - 2016 IEEE International Conference on Big Data, Big Data 2016. Institute of Electrical and Electronics Engineers Inc., 2017. pp. 2410-2415
@inproceedings{f667ee3ab743465d9f16d6f90f41b3f7,
title = "An FPGA-based low-latency network processing for spark streaming",
abstract = "Low-latency stream data processing is a key enabler for on-line data analysis applications, such as detecting anomaly conditions and change points from stream data continuously generated from sensors and networking services. Existing stream processing frameworks are classified into micro-batch and one-at-a-time processing methodology. Apache Spark Streaming employs the micro-batch methodology, where data analysis is repeatedly performed for a series of data arrived during a short time period, called a micro batch. A rich set of data analysis libraries provided by Spark, such as machine learning and graph processing, can be applied for the micro batches. However, a drawback of the micro-batch processing methodology is a high latency for detecting anomaly conditions and change points. This is because data are accumulated in a micro batch (e.g., 1 sec length) and then data analysis is performed for the micro batch. In this paper, we propose to offload one-at-a-time methodology analysis functions on an FPGA-based 10Gbit Ethernet network interface card (FPGA NIC) in cooperation with Spark Streaming framework, in order to significantly reduce the processing latency and improve the processing throughput. We implemented word count and change-point detection applications on Spark Streaming with our FPGA NIC, where a one-at-a-time methodology analysis logic is implemented. Experiment results demonstrates that the word count throughput is improved by 22x and the change-point detection latency is reduced by 94.12{\%} compared to the original Spark Streaming. Our approach can complement the existing micro-batch methodology data analysis framework with ultra low latency one-at-a-time methodology logic.",
author = "Kohei Nakamura and Ami Hayashi and Hiroki Matsutani",
year = "2017",
month = "2",
day = "2",
doi = "10.1109/BigData.2016.7840876",
language = "English",
pages = "2410--2415",
booktitle = "Proceedings - 2016 IEEE International Conference on Big Data, Big Data 2016",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - An FPGA-based low-latency network processing for spark streaming

AU - Nakamura, Kohei

AU - Hayashi, Ami

AU - Matsutani, Hiroki

PY - 2017/2/2

Y1 - 2017/2/2

N2 - Low-latency stream data processing is a key enabler for on-line data analysis applications, such as detecting anomaly conditions and change points from stream data continuously generated from sensors and networking services. Existing stream processing frameworks are classified into micro-batch and one-at-a-time processing methodology. Apache Spark Streaming employs the micro-batch methodology, where data analysis is repeatedly performed for a series of data arrived during a short time period, called a micro batch. A rich set of data analysis libraries provided by Spark, such as machine learning and graph processing, can be applied for the micro batches. However, a drawback of the micro-batch processing methodology is a high latency for detecting anomaly conditions and change points. This is because data are accumulated in a micro batch (e.g., 1 sec length) and then data analysis is performed for the micro batch. In this paper, we propose to offload one-at-a-time methodology analysis functions on an FPGA-based 10Gbit Ethernet network interface card (FPGA NIC) in cooperation with Spark Streaming framework, in order to significantly reduce the processing latency and improve the processing throughput. We implemented word count and change-point detection applications on Spark Streaming with our FPGA NIC, where a one-at-a-time methodology analysis logic is implemented. Experiment results demonstrates that the word count throughput is improved by 22x and the change-point detection latency is reduced by 94.12% compared to the original Spark Streaming. Our approach can complement the existing micro-batch methodology data analysis framework with ultra low latency one-at-a-time methodology logic.

AB - Low-latency stream data processing is a key enabler for on-line data analysis applications, such as detecting anomaly conditions and change points from stream data continuously generated from sensors and networking services. Existing stream processing frameworks are classified into micro-batch and one-at-a-time processing methodology. Apache Spark Streaming employs the micro-batch methodology, where data analysis is repeatedly performed for a series of data arrived during a short time period, called a micro batch. A rich set of data analysis libraries provided by Spark, such as machine learning and graph processing, can be applied for the micro batches. However, a drawback of the micro-batch processing methodology is a high latency for detecting anomaly conditions and change points. This is because data are accumulated in a micro batch (e.g., 1 sec length) and then data analysis is performed for the micro batch. In this paper, we propose to offload one-at-a-time methodology analysis functions on an FPGA-based 10Gbit Ethernet network interface card (FPGA NIC) in cooperation with Spark Streaming framework, in order to significantly reduce the processing latency and improve the processing throughput. We implemented word count and change-point detection applications on Spark Streaming with our FPGA NIC, where a one-at-a-time methodology analysis logic is implemented. Experiment results demonstrates that the word count throughput is improved by 22x and the change-point detection latency is reduced by 94.12% compared to the original Spark Streaming. Our approach can complement the existing micro-batch methodology data analysis framework with ultra low latency one-at-a-time methodology logic.

UR - http://www.scopus.com/inward/record.url?scp=85015245393&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85015245393&partnerID=8YFLogxK

U2 - 10.1109/BigData.2016.7840876

DO - 10.1109/BigData.2016.7840876

M3 - Conference contribution

AN - SCOPUS:85015245393

SP - 2410

EP - 2415

BT - Proceedings - 2016 IEEE International Conference on Big Data, Big Data 2016

PB - Institute of Electrical and Electronics Engineers Inc.

ER -