Loading…

Parallel strategies for 2D Discrete Wavelet Transform in shared memory systems and GPUs

In this work, we analyze the behavior of several parallel algorithms developed to compute the two-dimensional discrete wavelet transform using both OpenMP over a multicore platform and CUDA over a GPU. The proposed parallel algorithms are based on both regular filter-bank convolution and lifting tra...

Full description

Saved in:

Bibliographic Details
Published in:	The Journal of supercomputing 2013-04, Vol.64 (1), p.4-16
Main Authors:	Galiano, V., López, O., Malumbres, M. P., Migallón, H.
Format:	Article
Language:	English
Subjects:	Algorithms Central processing units Compilers Computer Science Convolution Discrete Wavelet Transform Interpreters Platforms Processor Architectures Programming Languages Reduction Strategy Two dimensional
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	In this work, we analyze the behavior of several parallel algorithms developed to compute the two-dimensional discrete wavelet transform using both OpenMP over a multicore platform and CUDA over a GPU. The proposed parallel algorithms are based on both regular filter-bank convolution and lifting transform with small implementations changes focused on both the memory requirements reduction and the complexity reduction. We compare our implementations against sequential CPU algorithms and other recently proposed algorithms like the SMDWT algorithm over different CPUs and the Wippig&Klauer algorithm over a GTX280 GPU. Finally, we analyze their behavior when algorithms are adapted to each architecture. Significant execution times improvements are achieved on both multicore platforms and GPUs. Depending on the multicore platform used, we achieve speed-ups of 1.9 and 3.4 using two and four processes, respectively, when compared to the sequential CPU algorithm, or we obtain speed-ups of 7.1 and 8.9 using eight and ten processes. Regarding GPUs, the GPU convolution algorithm using the GPU shared memory obtains speed-ups up to 20 when compared to the CPU sequential algorithm.
ISSN:	0920-8542 1573-0484
DOI:	10.1007/s11227-012-0750-5