TY - GEN
T1 - In-memory data parallel processor
AU - Fujiki, Daichi
AU - Mahlke, Scott
AU - Das, Reetuparna
N1 - Funding Information:
We thank members of M-Bits research group and the anonymous reviewers for their feedback. This work was supported in part by the NSF under the CAREER-1652294 award and the XPS-1628991 award, and by C-FAR, one of the six SRC STAR-net centers sponsored by MARCO and DARPA.
Funding Information:
Control Flow Control flow is supported by a select instruction. A select instruction takes three operands and generates output as follows: O[i] = Cond[i] ? A[i] : B[i]. A select instruction is converted into multiple selective move (movs) instructions. The Condition variable is precomputed and used to generate the mask for the selective moves.
Publisher Copyright:
© 2018 Copyright held by the owner/author(s).
PY - 2018/3/19
Y1 - 2018/3/19
N2 - Recent developments in Non-Volatile Memories (NVMs) have opened up a new horizon for in-memory computing. Despite the significant performance gain offered by computational NVMs, previous works have relied on manual mapping of specialized kernels to the memory arrays, making it infeasible to execute more general workloads. We combat this problem by proposing a programmable in-memory processor architecture and data-parallel programming framework. The efficiency of the proposed in-memory processor comes from two sources: massive parallelism and reduction in data movement. A compact instruction set provides generalized computation capabilities for the memory array. The proposed programming framework seeks to leverage the underlying parallelism in the hardware by merging the concepts of data-flow and vector processing. To facilitate in-memory programming, we develop a compilation framework that takes a TensorFlow input and generates code for our inmemory processor. Our results demonstrate 7.5× speedup over a multi-core CPU server for a set of applications from Parsec and 763× speedup over a server-class GPU for a set of Rodinia benchmarks.
AB - Recent developments in Non-Volatile Memories (NVMs) have opened up a new horizon for in-memory computing. Despite the significant performance gain offered by computational NVMs, previous works have relied on manual mapping of specialized kernels to the memory arrays, making it infeasible to execute more general workloads. We combat this problem by proposing a programmable in-memory processor architecture and data-parallel programming framework. The efficiency of the proposed in-memory processor comes from two sources: massive parallelism and reduction in data movement. A compact instruction set provides generalized computation capabilities for the memory array. The proposed programming framework seeks to leverage the underlying parallelism in the hardware by merging the concepts of data-flow and vector processing. To facilitate in-memory programming, we develop a compilation framework that takes a TensorFlow input and generates code for our inmemory processor. Our results demonstrate 7.5× speedup over a multi-core CPU server for a set of applications from Parsec and 763× speedup over a server-class GPU for a set of Rodinia benchmarks.
UR - http://www.scopus.com/inward/record.url?scp=85060079792&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85060079792&partnerID=8YFLogxK
U2 - 10.1145/3173162.3173171
DO - 10.1145/3173162.3173171
M3 - Conference contribution
AN - SCOPUS:85060079792
VL - 53
SP - 1
EP - 14
BT - Proceedings of the 23rd International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2018
PB - Association for Computing Machinery
T2 - 23rd International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2018
Y2 - 24 March 2018 through 28 March 2018
ER -