Loading…

Fast convolutional neural networks on FPGAs with hls4ml

Abstract We introduce an automated tool for deploying ultra low-latency, low-power deep neural networks with convolutional layers on field-programmable gate arrays (FPGAs). By extending the hls4ml library, we demonstrate an inference latency of 5 µ s using convolutional architectures, targeting mic...

Full description

Saved in:

Bibliographic Details
Published in:	Machine learning: science and technology 2021-12, Vol.2 (4), p.45015
Main Authors:	Aarrestad, Thea, Loncar, Vladimir, Ghielmetti, Nicolò, Pierini, Maurizio, Summers, Sioni, Ngadiuba, Jennifer, Petersson, Christoffer, Linander, Hampus, Iiyama, Yutaro, Di Guglielmo, Giuseppe, Duarte, Javier, Harris, Philip, Rankin, Dylan, Jindariani, Sergo, Pedro, Kevin, Tran, Nhan, Liu, Mia, Kreinar, Edward, Wu, Zhenbin, Hoang, Duc
Format:	Article
Language:	English
Subjects:	Accuracy Artificial neural networks convolutional neural network deep learning Field programmable gate arrays FPGA INSTRUMENTATION RELATED TO NUCLEAR SCIENCE AND TECHNOLOGY Large Hadron Collider Model accuracy Neural networks PHYSICS OF ELEMENTARY PARTICLES AND FIELDS Radiation counters Resource utilization
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Abstract We introduce an automated tool for deploying ultra low-latency, low-power deep neural networks with convolutional layers on field-programmable gate arrays (FPGAs). By extending the hls4ml library, we demonstrate an inference latency of 5 µ s using convolutional architectures, targeting microsecond latency applications like those at the CERN Large Hadron Collider. Considering benchmark models trained on the Street View House Numbers Dataset, we demonstrate various methods for model compression in order to fit the computational constraints of a typical FPGA device used in trigger and data acquisition systems of particle detectors. In particular, we discuss pruning and quantization-aware training, and demonstrate how resource utilization can be significantly reduced with little to no loss in model accuracy. We show that the FPGA critical resource consumption can be reduced by 97% with zero loss in model accuracy, and by 99% when tolerating a 6% accuracy degradation.
ISSN:	2632-2153 2632-2153
DOI:	10.1088/2632-2153/ac0ea1