Near-memory data transformation for efficient sparse matrix multi-vector multiplication

Daichi Fujiki, Niladrish Chatterjee, Donghyuk Lee, Mike O'Connor

Research output: Chapter in Book/Report/Conference proceedingConference contribution

8 Citations (Scopus)

Abstract

Efficient manipulation of sparse matrices is critical to a wide range of HPC applications. Increasingly, GPUs are used to accelerate these sparse matrix operations. We study one common operation, Sparse Matrix Multi-Vector Multiplication (SpMM), and evaluate the impact of the sparsity, distribution of non-zero elements, and tile-traversal strategies on GPU implementations. Using these insights, we determine that operating on these sparse matrices in a Densified Compressed Sparse Row (DCSR) is well-suited to the parallel warp-synchronous execution model of the GPU processing elements. Preprocessing or storing the sparse matrix in the DCSR format, however, often requires significantly more memory storage than conventional Compressed Sparse Row (CSR) or Compressed Sparse Column (CSC) formats. Given that SpMM kernels are often bottlenecked on DRAM bandwidth, the increase in DRAM traffic to access the larger DCSR formatted data structure can result in a slowdown for many matrices. We propose a near-memory transform engine to dynamically create DCSR formatted tiles for the GPU processing elements from the CSC formatted matrix in memory. This work enhances a GPU's last-level cache/memory controller unit to act as an efficient translator between the compute-optimized representation of data and its corresponding storage/bandwidth-optimized format to accelerate sparse workloads. Our approach achieves 2.26× better performance on average compared to the vendor supplied optimized library for sparse matrix operations, cuSPARSE.

Original languageEnglish
Title of host publicationProceedings of SC 2019
Subtitle of host publicationThe International Conference for High Performance Computing, Networking, Storage and Analysis
PublisherIEEE Computer Society
ISBN (Electronic)9781450362290
DOIs
Publication statusPublished - 2019 Nov 17
Externally publishedYes
Event2019 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2019 - Denver, United States
Duration: 2019 Nov 172019 Nov 22

Publication series

NameInternational Conference for High Performance Computing, Networking, Storage and Analysis, SC
ISSN (Print)2167-4329
ISSN (Electronic)2167-4337

Conference

Conference2019 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2019
Country/TerritoryUnited States
CityDenver
Period19/11/1719/11/22

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Computer Science Applications
  • Hardware and Architecture
  • Software

Fingerprint

Dive into the research topics of 'Near-memory data transformation for efficient sparse matrix multi-vector multiplication'. Together they form a unique fingerprint.

Cite this