TY - GEN
T1 - Duality cache for data parallel acceleration
AU - Fujiki, Daichi
AU - Mahlke, Scott
AU - Das, Reetuparna
N1 - Funding Information:
We thank members of M-Bits research group and the anonymous reviewers for their feedback. This work was supported in part by the NSF under the CAREER-1652294 award, the XPS-1628991 award, the SHF-1763918 award and Applications Driving Architectures (ADA) Research Center, a JUMP Center co-sponsored by SRC and DARPA.
Publisher Copyright:
© 2019 ACM.
PY - 2019/6/22
Y1 - 2019/6/22
N2 - Duality Cache is an in-cache computation architecture that enables general purpose data parallel applications to run on caches. This paper presents a holistic approach of building Duality Cache system stack with techniques of performing in-cache floating point arithmetic and transcendental functions, enabling a data-parallel execution model, designing a compiler that accepts existing CUDA programs, and providing flexibility in adopting for various workload characteristics. Exposure to massive parallelism that exists in the Duality Cache architecture improves performance of GPU benchmarks by 3.6× and OpenACC benchmarks by 4.0× over a server class GPU. Re-purposing existing caches provides 72.6× better performance for CPUs with only 3.5% of area cost. Duality Cache reduces energy by 5.8× over GPUs and 21× over CPUs.
AB - Duality Cache is an in-cache computation architecture that enables general purpose data parallel applications to run on caches. This paper presents a holistic approach of building Duality Cache system stack with techniques of performing in-cache floating point arithmetic and transcendental functions, enabling a data-parallel execution model, designing a compiler that accepts existing CUDA programs, and providing flexibility in adopting for various workload characteristics. Exposure to massive parallelism that exists in the Duality Cache architecture improves performance of GPU benchmarks by 3.6× and OpenACC benchmarks by 4.0× over a server class GPU. Re-purposing existing caches provides 72.6× better performance for CPUs with only 3.5% of area cost. Duality Cache reduces energy by 5.8× over GPUs and 21× over CPUs.
UR - http://www.scopus.com/inward/record.url?scp=85069509205&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85069509205&partnerID=8YFLogxK
U2 - 10.1145/3307650.3322257
DO - 10.1145/3307650.3322257
M3 - Conference contribution
AN - SCOPUS:85069509205
T3 - Proceedings - International Symposium on Computer Architecture
SP - 397
EP - 410
BT - ISCA 2019 - Proceedings of the 2019 46th International Symposium on Computer Architecture
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 46th International Symposium on Computer Architecture, ISCA 2019
Y2 - 22 June 2019 through 26 June 2019
ER -