Loading…

Wino Vidi Vici: Conquering Numerical Instability of 8-bit Winograd Convolution for Accurate Inference Acceleration on Edge

Winograd-based convolution can reduce the total number of operations needed for convolutional neural network (CNN) inference on edge devices. Most edge hardware accelerators use low-precision, 8-bit integer arithmetic units to improve energy efficiency and latency. This makes CNN quantization a crit...

Full description

Saved in:

Bibliographic Details
Main Authors:	Mori, Pierpaolo, Frickenstein, Lukas, Sampath, Shambhavi Balamuthu, Thoma, Moritz, Fasfous, Nael, Vemparala, Manoj Rohit, Frickenstein, Alexander, Unger, Christian, Stechele, Walter, Mueller-Gritschneder, Daniel, Passerone, Claudio
Format:	Conference Proceeding
Language:	English
Subjects:	Algorithms and algorithms Convolution Degradation formulations Inference algorithms Machine learning architectures Numerical models Quantization (signal) Training Transforms
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Winograd-based convolution can reduce the total number of operations needed for convolutional neural network (CNN) inference on edge devices. Most edge hardware accelerators use low-precision, 8-bit integer arithmetic units to improve energy efficiency and latency. This makes CNN quantization a critical step before deploying the model on such an edge device. To extract the benefits of fast Winograd-based convolution and efficient integer quantization, the two approaches must be combined. Research has shown that the transform required to execute convolutions in the Winograd domain results in numerical instability and severe accuracy degradation when combined with quantization, making the two techniques incompatible on edge hardware. This paper proposes a novel training scheme to achieve efficient Winograd-accelerated, quantized CNNs. 8-bit quantization is applied to all the intermediate results of the Winograd convolution without sacrificing task-related accuracy. This is achieved by introducing clipping factors in the intermediate quantization stages as well as using the complex numerical system to improve the transform. We achieve 2.8× and 2.1× reduction in MAC operations on ResNet-20-CIFAR-10 and ResNet-18-ImageNet, respectively, with no accuracy degradation.
ISSN:	2642-9381
DOI:	10.1109/WACV57701.2024.00013