Loading…

A High-efficiency FPGA-based Accelerator for Convolutional Neural Networks using Winograd Algorithm

Convolutional neural networks (CNNs) are widely used in many computer vision applications. Previous FPGA implementations of CNNs are mainly based on the conventional convolutional algorithm. However, the high arithmetic complexity of conventional convolution algorithm for CNNs restricts the performa...

Full description

Saved in:

Bibliographic Details
Published in:	Journal of physics. Conference series 2018-05, Vol.1026 (1), p.12019
Main Authors:	Huang, Y, Shen, J, Wang, Z, Wen, M, Zhang, C
Format:	Article
Language:	English
Subjects:	Accelerators Algorithms Artificial neural networks Complexity Computer vision Efficiency High level synthesis Neural networks Performance evaluation Physics Resource utilization
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Convolutional neural networks (CNNs) are widely used in many computer vision applications. Previous FPGA implementations of CNNs are mainly based on the conventional convolutional algorithm. However, the high arithmetic complexity of conventional convolution algorithm for CNNs restricts the performance of accelerators and significantly increases the challenges of design. It has been proved that the Winograd algorithm for CNNs can effectively reduce the computational complexity. Although a few FPGA approaches based on the Winograd algorithm have been implemented, their works are lake of evaluation on the performance for different tile sizes of the Winograd algorithm. In this work, we focus on exploring the possibility of using the Winograd algorithm to accelerate CNNs on FPGA. First, we propose an accelerator architecture applying to both convolutional layers and fully connected layers. Second, we use high level synthesis tool to expediently implement our design. Finally, we evaluate our accelerator with different tile sizes in terms of resource utilization, performance and efficiency. On VUS440 platform, we achieve an average 943 GOPS for overall VGG16 under low resource utilization, which reaches higher efficiency than the state-of-the-art works on FPGAs.
ISSN:	1742-6588 1742-6596
DOI:	10.1088/1742-6596/1026/1/012019