Due to the recent advances in Deep Neural Network (DNN) technologies, recognition and inference applications are expected to run on mobile embeddedsystems. Developing high-performance and power-efficient DNN engines becomesone of the important challenges for embedded systems. Since DNN algorithms orstructures are frequently updated, flexibility and performance scalability todeal with various types of networks are crucial requirement of the DNNaccelerator design. In this paper, we describe the architecture and LSI designof a flexible and scalable CNN accelerator called SNACC (Scalable NeuroAccelerator Core with Cubic integration) which consists of several processingcores, on-chip memory modules, and ThruChip Interface (TCI). We evaluate thescalability of SNACC with detailed simulation varying the number of cores andoff-chip memory access bandwidth. The results show that the energy efficiency of the accelerator becomes the highest in eight cores configuration with500MB/s off-chip bandwidth.