QUEST

Multi-Purpose Log-Quantized DNN Inference Engine Stacked on 96-MB 3-D SRAM Using Inductive Coupling Technology in 40-nm CMOS

Kodai Ueyoshi, Kota Ando, Kazutoshi Hirose, Shinya Takamaeda-Yamazaki, Mototsugu Hamada, Tadahiro Kuroda, Masato Motomura

Research output: Contribution to journalArticle

Abstract

QUEST is a programmable multiple instruction, multiple data (MIMD) parallel accelerator for general-purpose state-of-the-art deep neural networks (DNNs). It features die-to-die stacking with three-cycle latency, 28.8 GB/s, 96 MB, and eight SRAMs using an inductive coupling technology called the ThruChip interface (TCI). By stacking the SRAMs instead of DRAMs, lower memory access latency and simpler hardware are expected. This facilitates in balancing the memory capacity, latency, and bandwidth, all of which are in demand by cutting-edge DNNs at a high level. QUEST also introduces log-quantized programmable bit-precision processing for achieving faster (larger) DNN computation (size) in a 3-D module. It can sustain higher recognition accuracy at a lower bitwidth region compared to linear quantization. The prototype QUEST chip is integrated in the 40-nm CMOS technology, and it achieves 7.49 tera operations per second (TOPS) peak performance in binary precision, and 1.96 TOPS in 4-bit precision at 300-MHz clock.

Original languageEnglish
JournalIEEE Journal of Solid-State Circuits
DOIs
Publication statusAccepted/In press - 2018 Jan 1

Fingerprint

Inference engines
Static random access storage
Data storage equipment
Dynamic random access storage
Computer hardware
Particle accelerators
Clocks
Bandwidth
Processing
Deep neural networks

Keywords

  • Accelerator
  • deep learning
  • deep neural networks (DNNs)
  • logarithmic-quantized neural networks
  • processor architecture.

ASJC Scopus subject areas

  • Electrical and Electronic Engineering

Cite this

QUEST : Multi-Purpose Log-Quantized DNN Inference Engine Stacked on 96-MB 3-D SRAM Using Inductive Coupling Technology in 40-nm CMOS. / Ueyoshi, Kodai; Ando, Kota; Hirose, Kazutoshi; Takamaeda-Yamazaki, Shinya; Hamada, Mototsugu; Kuroda, Tadahiro; Motomura, Masato.

In: IEEE Journal of Solid-State Circuits, 01.01.2018.

Research output: Contribution to journalArticle

Ueyoshi, Kodai ; Ando, Kota ; Hirose, Kazutoshi ; Takamaeda-Yamazaki, Shinya ; Hamada, Mototsugu ; Kuroda, Tadahiro ; Motomura, Masato. / QUEST : Multi-Purpose Log-Quantized DNN Inference Engine Stacked on 96-MB 3-D SRAM Using Inductive Coupling Technology in 40-nm CMOS. In: IEEE Journal of Solid-State Circuits. 2018.
@article{2faafcbc75f240d3be250cb421691367,
title = "QUEST: Multi-Purpose Log-Quantized DNN Inference Engine Stacked on 96-MB 3-D SRAM Using Inductive Coupling Technology in 40-nm CMOS",
abstract = "QUEST is a programmable multiple instruction, multiple data (MIMD) parallel accelerator for general-purpose state-of-the-art deep neural networks (DNNs). It features die-to-die stacking with three-cycle latency, 28.8 GB/s, 96 MB, and eight SRAMs using an inductive coupling technology called the ThruChip interface (TCI). By stacking the SRAMs instead of DRAMs, lower memory access latency and simpler hardware are expected. This facilitates in balancing the memory capacity, latency, and bandwidth, all of which are in demand by cutting-edge DNNs at a high level. QUEST also introduces log-quantized programmable bit-precision processing for achieving faster (larger) DNN computation (size) in a 3-D module. It can sustain higher recognition accuracy at a lower bitwidth region compared to linear quantization. The prototype QUEST chip is integrated in the 40-nm CMOS technology, and it achieves 7.49 tera operations per second (TOPS) peak performance in binary precision, and 1.96 TOPS in 4-bit precision at 300-MHz clock.",
keywords = "Accelerator, deep learning, deep neural networks (DNNs), logarithmic-quantized neural networks, processor architecture.",
author = "Kodai Ueyoshi and Kota Ando and Kazutoshi Hirose and Shinya Takamaeda-Yamazaki and Mototsugu Hamada and Tadahiro Kuroda and Masato Motomura",
year = "2018",
month = "1",
day = "1",
doi = "10.1109/JSSC.2018.2871623",
language = "English",
journal = "IEEE Journal of Solid-State Circuits",
issn = "0018-9200",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - QUEST

T2 - Multi-Purpose Log-Quantized DNN Inference Engine Stacked on 96-MB 3-D SRAM Using Inductive Coupling Technology in 40-nm CMOS

AU - Ueyoshi, Kodai

AU - Ando, Kota

AU - Hirose, Kazutoshi

AU - Takamaeda-Yamazaki, Shinya

AU - Hamada, Mototsugu

AU - Kuroda, Tadahiro

AU - Motomura, Masato

PY - 2018/1/1

Y1 - 2018/1/1

N2 - QUEST is a programmable multiple instruction, multiple data (MIMD) parallel accelerator for general-purpose state-of-the-art deep neural networks (DNNs). It features die-to-die stacking with three-cycle latency, 28.8 GB/s, 96 MB, and eight SRAMs using an inductive coupling technology called the ThruChip interface (TCI). By stacking the SRAMs instead of DRAMs, lower memory access latency and simpler hardware are expected. This facilitates in balancing the memory capacity, latency, and bandwidth, all of which are in demand by cutting-edge DNNs at a high level. QUEST also introduces log-quantized programmable bit-precision processing for achieving faster (larger) DNN computation (size) in a 3-D module. It can sustain higher recognition accuracy at a lower bitwidth region compared to linear quantization. The prototype QUEST chip is integrated in the 40-nm CMOS technology, and it achieves 7.49 tera operations per second (TOPS) peak performance in binary precision, and 1.96 TOPS in 4-bit precision at 300-MHz clock.

AB - QUEST is a programmable multiple instruction, multiple data (MIMD) parallel accelerator for general-purpose state-of-the-art deep neural networks (DNNs). It features die-to-die stacking with three-cycle latency, 28.8 GB/s, 96 MB, and eight SRAMs using an inductive coupling technology called the ThruChip interface (TCI). By stacking the SRAMs instead of DRAMs, lower memory access latency and simpler hardware are expected. This facilitates in balancing the memory capacity, latency, and bandwidth, all of which are in demand by cutting-edge DNNs at a high level. QUEST also introduces log-quantized programmable bit-precision processing for achieving faster (larger) DNN computation (size) in a 3-D module. It can sustain higher recognition accuracy at a lower bitwidth region compared to linear quantization. The prototype QUEST chip is integrated in the 40-nm CMOS technology, and it achieves 7.49 tera operations per second (TOPS) peak performance in binary precision, and 1.96 TOPS in 4-bit precision at 300-MHz clock.

KW - Accelerator

KW - deep learning

KW - deep neural networks (DNNs)

KW - logarithmic-quantized neural networks

KW - processor architecture.

UR - http://www.scopus.com/inward/record.url?scp=85055053908&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85055053908&partnerID=8YFLogxK

U2 - 10.1109/JSSC.2018.2871623

DO - 10.1109/JSSC.2018.2871623

M3 - Article

JO - IEEE Journal of Solid-State Circuits

JF - IEEE Journal of Solid-State Circuits

SN - 0018-9200

ER -