TY - GEN
T1 - Vegeta
T2 - 2nd International Conference on Networking and Computing, ICNC 2011
AU - Shitara, Akihiro
AU - Nakahama, Tetsuya
AU - Yamada, Masahiro
AU - Kamata, Toshiaki
AU - Nishikawa, Yuri
AU - Yoshimi, Masato
AU - Amano, Hideharu
PY - 2011
Y1 - 2011
N2 - Programming on the cluster with accelerators like GP-GPU tends to be a mixture of intra-node parallel library based on CUDA or OpenCL and inter-node communication library including MPI. In this work, we proposed, implemented and evaluated VEGETA, a middleware that can inject OpenCL program tasks written for multiple OpenCL accelerators in a single chassis to multiple OpenCL accelerators equipped in multiple chassis. Furthermore, we add a new feature called Virtual Direct Memory Access (VDMA) scheme, which supports direct data transfer to other node without writing back to the memory region on user application. In execution of a matrix multiplication benchmark on two, three and four nodes each provided performance improvement of 1.9, 2.8 and 3.8 times. Furthermore, as the result of executing advection term computation based on Cartesian grid method, 78% of the performance compared to that of MPI version was obtained even without use of VDMA, and moreover, 96% of that was achieved the system with VDMA.
AB - Programming on the cluster with accelerators like GP-GPU tends to be a mixture of intra-node parallel library based on CUDA or OpenCL and inter-node communication library including MPI. In this work, we proposed, implemented and evaluated VEGETA, a middleware that can inject OpenCL program tasks written for multiple OpenCL accelerators in a single chassis to multiple OpenCL accelerators equipped in multiple chassis. Furthermore, we add a new feature called Virtual Direct Memory Access (VDMA) scheme, which supports direct data transfer to other node without writing back to the memory region on user application. In execution of a matrix multiplication benchmark on two, three and four nodes each provided performance improvement of 1.9, 2.8 and 3.8 times. Furthermore, as the result of executing advection term computation based on Cartesian grid method, 78% of the performance compared to that of MPI version was obtained even without use of VDMA, and moreover, 96% of that was achieved the system with VDMA.
KW - GPU
KW - OpenCL
KW - middleware
UR - http://www.scopus.com/inward/record.url?scp=84856944083&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84856944083&partnerID=8YFLogxK
U2 - 10.1109/ICNC.2011.28
DO - 10.1109/ICNC.2011.28
M3 - Conference contribution
AN - SCOPUS:84856944083
SN - 9780769545691
T3 - Proceedings - 2011 2nd International Conference on Networking and Computing, ICNC 2011
SP - 141
EP - 147
BT - Proceedings - 2011 2nd International Conference on Networking and Computing, ICNC 2011
Y2 - 30 November 2011 through 2 December 2011
ER -