Loading…

Deep Learning-Based Empirical and Sub-Space Decomposition for Speech Enhancement

This research presents a single-channel speech enhancement approach based on the combination of the adaptive empirical wavelet transform and the improved sub-space decomposition method followed by a deep learning network. The adaptive empirical wavelet transform is used to determine the boundaries o...

Full description

Saved in:

Bibliographic Details
Published in:	Circuits, systems, and signal processing systems, and signal processing, 2024-06, Vol.43 (6), p.3596-3626
Main Authors:	Mraihi, Khaoula, Ben Messaoud, Mohamed Anouar
Format:	Article
Language:	English
Subjects:	Circuits and Systems Decomposition Deep learning Electrical Engineering Electronics and Microelectronics Engineering Instrumentation Signal,Image and Speech Processing Sparse matrices Speech processing Wavelet transforms
Citations:	Items that this one cites
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	This research presents a single-channel speech enhancement approach based on the combination of the adaptive empirical wavelet transform and the improved sub-space decomposition method followed by a deep learning network. The adaptive empirical wavelet transform is used to determine the boundaries of the segments, then we decompose the obtained spectrogram of the noisy speech into three sub-spaces to determine the low-rank matrix and the sparse matrix of the spectrogram under the perturbation of the residual matrix. The residual noise affecting the speech quality is avoided by the low-rank decomposition using the nonnegative factorization. Then, a cross-domain learning framework is developed to specify the correlations along the frequency and time axes and avoid the disadvantages of the time–frequency domain. Experimental results show that the proposed approach outperforms several competing speech enhancement methods and achieves the highest PESQ, Cov and STOI under different types of noise and at low SNR values in the two datasets. The proposed model is tested on a hardware-level manual design to accelerate the execution of the developed deep learning model on an FPGA.
ISSN:	0278-081X 1531-5878
DOI:	10.1007/s00034-024-02606-4