Performing external join operator on PostgreSQL with data transfer approach

Ryota Takizawa, Hideyuki Kawashima, Ryuya Mitsuhashi, Osamu Tatebe

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

With the development of sensing devices, the size of data managed by human being has been rapidly increasing. To manage such huge data, relational database management system (RDBMS) plays a key role. RDBMS models the real world data as n-ary relational tables. Join operator is one of the most important relational operators, and its acceleration has been studied widely and deeply. How can an RDBMS provide such an efficient join operator? The performance improvement of join operator has been deeply studied for a decade, and many techniques are proposed already. The problem that we face is how to actually use such excellent techniques in real RDBMSs. We propose to implement an efficient join technique by the data transfer approach. The approach makes a hook point inside an RDBMS internal, and pulls data streams from the operator pipeline in the RDBMS, and applies our original join operator to the data, and finally returns the result to the operator pipeline in the RDBMS. The result of the experiment showed that our proposed method achieved 1.42x speedup compared with PostgreSQL. Our code is available on GitHub.

Original languageEnglish
Title of host publicationProceedings of International Conference on High Performance Computing in Asia-Pacific Region, HPC Asia 2018
PublisherAssociation for Computing Machinery
Pages271-277
Number of pages7
ISBN (Electronic)9781450353724
DOIs
Publication statusPublished - 2018 Jan 28
Externally publishedYes
Event2018 International Conference on High Performance Computing in Asia-Pacific Region, HPC Asia 2018 - Tokyo, Japan
Duration: 2018 Jan 282018 Jan 31

Other

Other2018 International Conference on High Performance Computing in Asia-Pacific Region, HPC Asia 2018
CountryJapan
CityTokyo
Period18/1/2818/1/31

Fingerprint

Data transfer
Pipelines
Hooks
Experiments

Keywords

  • Parallel Hash Join
  • PostgreSQL
  • Relational database

ASJC Scopus subject areas

  • Human-Computer Interaction
  • Computer Networks and Communications
  • Computer Vision and Pattern Recognition
  • Software

Cite this

Takizawa, R., Kawashima, H., Mitsuhashi, R., & Tatebe, O. (2018). Performing external join operator on PostgreSQL with data transfer approach. In Proceedings of International Conference on High Performance Computing in Asia-Pacific Region, HPC Asia 2018 (pp. 271-277). Association for Computing Machinery. https://doi.org/10.1145/3149457.3149480

Performing external join operator on PostgreSQL with data transfer approach. / Takizawa, Ryota; Kawashima, Hideyuki; Mitsuhashi, Ryuya; Tatebe, Osamu.

Proceedings of International Conference on High Performance Computing in Asia-Pacific Region, HPC Asia 2018. Association for Computing Machinery, 2018. p. 271-277.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Takizawa, R, Kawashima, H, Mitsuhashi, R & Tatebe, O 2018, Performing external join operator on PostgreSQL with data transfer approach. in Proceedings of International Conference on High Performance Computing in Asia-Pacific Region, HPC Asia 2018. Association for Computing Machinery, pp. 271-277, 2018 International Conference on High Performance Computing in Asia-Pacific Region, HPC Asia 2018, Tokyo, Japan, 18/1/28. https://doi.org/10.1145/3149457.3149480
Takizawa R, Kawashima H, Mitsuhashi R, Tatebe O. Performing external join operator on PostgreSQL with data transfer approach. In Proceedings of International Conference on High Performance Computing in Asia-Pacific Region, HPC Asia 2018. Association for Computing Machinery. 2018. p. 271-277 https://doi.org/10.1145/3149457.3149480
Takizawa, Ryota ; Kawashima, Hideyuki ; Mitsuhashi, Ryuya ; Tatebe, Osamu. / Performing external join operator on PostgreSQL with data transfer approach. Proceedings of International Conference on High Performance Computing in Asia-Pacific Region, HPC Asia 2018. Association for Computing Machinery, 2018. pp. 271-277
@inproceedings{4ef34331a2f246059f904f32d16aecf2,
title = "Performing external join operator on PostgreSQL with data transfer approach",
abstract = "With the development of sensing devices, the size of data managed by human being has been rapidly increasing. To manage such huge data, relational database management system (RDBMS) plays a key role. RDBMS models the real world data as n-ary relational tables. Join operator is one of the most important relational operators, and its acceleration has been studied widely and deeply. How can an RDBMS provide such an efficient join operator? The performance improvement of join operator has been deeply studied for a decade, and many techniques are proposed already. The problem that we face is how to actually use such excellent techniques in real RDBMSs. We propose to implement an efficient join technique by the data transfer approach. The approach makes a hook point inside an RDBMS internal, and pulls data streams from the operator pipeline in the RDBMS, and applies our original join operator to the data, and finally returns the result to the operator pipeline in the RDBMS. The result of the experiment showed that our proposed method achieved 1.42x speedup compared with PostgreSQL. Our code is available on GitHub.",
keywords = "Parallel Hash Join, PostgreSQL, Relational database",
author = "Ryota Takizawa and Hideyuki Kawashima and Ryuya Mitsuhashi and Osamu Tatebe",
year = "2018",
month = "1",
day = "28",
doi = "10.1145/3149457.3149480",
language = "English",
pages = "271--277",
booktitle = "Proceedings of International Conference on High Performance Computing in Asia-Pacific Region, HPC Asia 2018",
publisher = "Association for Computing Machinery",

}

TY - GEN

T1 - Performing external join operator on PostgreSQL with data transfer approach

AU - Takizawa, Ryota

AU - Kawashima, Hideyuki

AU - Mitsuhashi, Ryuya

AU - Tatebe, Osamu

PY - 2018/1/28

Y1 - 2018/1/28

N2 - With the development of sensing devices, the size of data managed by human being has been rapidly increasing. To manage such huge data, relational database management system (RDBMS) plays a key role. RDBMS models the real world data as n-ary relational tables. Join operator is one of the most important relational operators, and its acceleration has been studied widely and deeply. How can an RDBMS provide such an efficient join operator? The performance improvement of join operator has been deeply studied for a decade, and many techniques are proposed already. The problem that we face is how to actually use such excellent techniques in real RDBMSs. We propose to implement an efficient join technique by the data transfer approach. The approach makes a hook point inside an RDBMS internal, and pulls data streams from the operator pipeline in the RDBMS, and applies our original join operator to the data, and finally returns the result to the operator pipeline in the RDBMS. The result of the experiment showed that our proposed method achieved 1.42x speedup compared with PostgreSQL. Our code is available on GitHub.

AB - With the development of sensing devices, the size of data managed by human being has been rapidly increasing. To manage such huge data, relational database management system (RDBMS) plays a key role. RDBMS models the real world data as n-ary relational tables. Join operator is one of the most important relational operators, and its acceleration has been studied widely and deeply. How can an RDBMS provide such an efficient join operator? The performance improvement of join operator has been deeply studied for a decade, and many techniques are proposed already. The problem that we face is how to actually use such excellent techniques in real RDBMSs. We propose to implement an efficient join technique by the data transfer approach. The approach makes a hook point inside an RDBMS internal, and pulls data streams from the operator pipeline in the RDBMS, and applies our original join operator to the data, and finally returns the result to the operator pipeline in the RDBMS. The result of the experiment showed that our proposed method achieved 1.42x speedup compared with PostgreSQL. Our code is available on GitHub.

KW - Parallel Hash Join

KW - PostgreSQL

KW - Relational database

UR - http://www.scopus.com/inward/record.url?scp=85044384770&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85044384770&partnerID=8YFLogxK

U2 - 10.1145/3149457.3149480

DO - 10.1145/3149457.3149480

M3 - Conference contribution

SP - 271

EP - 277

BT - Proceedings of International Conference on High Performance Computing in Asia-Pacific Region, HPC Asia 2018

PB - Association for Computing Machinery

ER -