Loading…

A 22nm 8Mb STT-MRAM Near-Memory-Computing Macro with 8b-Precision and 46.4-160.1TOPS/W for Edge-AI Devices

Nonvolatile-memory-based computing in memory (nvCIM) [1-6] is ideal for low-power edge-Al devices requiring neural network (NN) parameter storage in the power-off mode, a rapid response to device wake-up, and high energy efficiency for MAC operations (\text{EF}_{\text{MAC}}) . Current analog nvCIMs...

Full description

Saved in:
Bibliographic Details
Main Authors: Chiu, Yen-Cheng, Khwa, Win-San, Li, Chung-Yuan, Hsieh, Fang-Ling, Chien, Yu-An, Lin, Guan-Yi, Chen, Po-Jung, Pan, Tsen-Hsiang, You, De-Qi, Chen, Fang-Yi, Lee, Andrew, Lo, Chung-Chuan, Liu, Ren-Shuo, Hsieh, Chih-Cheng, Tang, Kea-Tiong, Chih, Yu-Der, Chang, Tsung-Yung, Chang, Meng-Fan
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Nonvolatile-memory-based computing in memory (nvCIM) [1-6] is ideal for low-power edge-Al devices requiring neural network (NN) parameter storage in the power-off mode, a rapid response to device wake-up, and high energy efficiency for MAC operations (\text{EF}_{\text{MAC}}) . Current analog nvCIMs impose a tradeoff between the signal margin (SM) and the number of accumulations (\mathrm{N}_{\mathrm{A}\text{CU}}) per cycle versus \text{EF}_{\text{MAC}} and computing latency (\mathrm{T}_{\text{CD}-\text{MAC}}) . Near-memory computing (NMC), with high precision for inputs (IN), weights (W), and outputs (OUT), and a high \mathrm{N}_{\text{ACU}} is a trend to improve \text{EF}_{\text{MAC}}, \mathrm{T}_{\text{CD}-\text{MAC}} , and accuracy. A prior STT-MRAM NMC [1] uses vertical-weight mapping (VWM) to improve the \text{EF}_{\text{MAC}} ; however, further improvement is challenging: due to (1) the large energy consumption in reading repetitious weight data across multiple inputs for a single NN-layer; (2) a high bitstream toggling-rate (BTR) for digital MAC circuits (\text{DC}_{\text{MAC}}) reduces \text{EF}_{\text{MAC}} , and; (3) a limited SM and memory readout latency (\mathrm{T}_{\text{CD}-\mathrm{M}}) for memories with a small R-ratio (e.g. STT-MRAM, see Fig. 33.2.1). In developing an STT-MRAM nvCIM macro, this work moves beyond circuit-level novelty by using system-software-circuit co-design. This work achieves a high \text{EF}_{\text{MAC}} , a short \mathrm{T}_{\text{CD-M}} , a high read bandwidth (R-BW), a high IN-W-OUT precision, and a high \mathrm{N}_{\text{ACU}} by using the novel schemes: (1) a hardware based weight-feature aware read (WFAR) to reduce weight accesses and improve \text{EF}_{\text{MAC}} with a minimal area overhead; (2) toggling-aware weight-tuning (TAWT) to obtain fine-tuned weights (\mathrm{W}_{\text{FT}}) with a low BTR, which is based on VWM to enhance the \text{EF}_{\text{MAC}} of the \text{DC}_{\text{MAC}} ; (3) a differential charge-accumulating margin-enhanced voltage-sensing amplifier (DCME-VSA) to enhance the SM, while reducing the T CD - M . The proposed 22-nm S-Mb STT-MRAM NMC nvCIM macro achieves the highest R-BW (436\text{GB}/\mathrm{s}) and \text{EF}_{\text{MAC}}(46.4-160.1\text{TO}\text{PS}/\mathrm{W}) for \mathrm{N}_{\mathrm{A}\text{CU}}=576 for SblN - SbW - 26bOUT.
ISSN:2376-8606
DOI:10.1109/ISSCC42615.2023.10067563