Loading…

MF-Net: Compute-In-Memory SRAM for Multibit Precision Inference Using Memory-Immersed Data Conversion and Multiplication-Free Operators

We propose a co-design approach for compute-in-memory inference for deep neural networks (DNN). We use multiplication-free function approximators based on \ell _{1} norm along with a co-adapted processing array and compute flow. Using the approach, we overcame many deficiencies in the current art...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on circuits and systems. I, Regular papers Regular papers, 2021-05, Vol.68 (5), p.1966-1978
Main Authors:	Nasrin, Shamma, Badawi, Diaa, Cetin, Ahmet Enis, Gomes, Wilfred, Trivedi, Amit Ranjan
Format:	Article
Language:	English
Subjects:	Analog to digital conversion Analog to digital converters Arrays Artificial neural networks Biological neural networks CIFAR10 CIFAR100 CMOS Co-design Complexity theory Configurations Correlation Data conversion Deep neural networks Digital to analog conversion Digital to analog converters in-SRAM processing Inference MNIST Multiplication Neurons Operators (mathematics) Parallel processing Parasitics (electronics) Random access memory SRAM cells Static random access memory
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	We propose a co-design approach for compute-in-memory inference for deep neural networks (DNN). We use multiplication-free function approximators based on \ell _{1} norm along with a co-adapted processing array and compute flow. Using the approach, we overcame many deficiencies in the current art of in-SRAM DNN processing such as the need for digital-to-analog converters (DACs) at each operating SRAM row/column, the need for high precision analog-to-digital converters (ADCs), limited support for multi-bit precision weights, and limited vector-scale parallelism. Our co-adapted implementation seamlessly extends to multi-bit precision weights, it doesn't require DACs, and it easily extends to higher vector-scale parallelism. We also propose an SRAM-immersed successive approximation ADC (SA-ADC), where we exploit the parasitic capacitance of bit lines of SRAM array as a capacitive DAC. Since the dominant area overhead in SA-ADC comes due to its capacitive DAC, by exploiting the intrinsic parasitic of SRAM array, our approach allows low area implementation of within-SRAM SA-ADC. Our 8\times 62 SRAM macro, which requires a 5-bit ADC, achieves ~105 tera operations per second per Watt (TOPS/W) with 8-bit input/weight processing at 45 nm CMOS. Our 8\times 30 SRAM macro, which requires a 4-bit ADC, achieves ~84 TOPS/W. SRAM macros that require lower ADC precision are more tolerant of process variability, however, have lower TOPS/W as well. We evaluated the accuracy and performance of our proposed network for MNIST, CIFAR10, and CIFAR100 datasets. We chose a network configuration which adaptively mixes multiplication-free and regular operators. The network configurations utilize the multiplication-free operator for more than 85% operations from the total. The selected configurations are 98.6% accurate for MNIST, 90.2% for CIFAR10, and 66.9% for CIFAR100. Since most of the operations in the considered configurations are based on proposed SRAM macros, our compute-in-memory's efficiency benefits broadly translate to the system-level.
ISSN:	1549-8328 1558-0806
DOI:	10.1109/TCSI.2021.3064033