Loading…
Deep Learning-Based Empirical and Sub-Space Decomposition for Speech Enhancement
This research presents a single-channel speech enhancement approach based on the combination of the adaptive empirical wavelet transform and the improved sub-space decomposition method followed by a deep learning network. The adaptive empirical wavelet transform is used to determine the boundaries o...
Saved in:
Published in: | Circuits, systems, and signal processing systems, and signal processing, 2024-06, Vol.43 (6), p.3596-3626 |
---|---|
Main Authors: | , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | This research presents a single-channel speech enhancement approach based on the combination of the adaptive empirical wavelet transform and the improved sub-space decomposition method followed by a deep learning network. The adaptive empirical wavelet transform is used to determine the boundaries of the segments, then we decompose the obtained spectrogram of the noisy speech into three sub-spaces to determine the low-rank matrix and the sparse matrix of the spectrogram under the perturbation of the residual matrix. The residual noise affecting the speech quality is avoided by the low-rank decomposition using the nonnegative factorization. Then, a cross-domain learning framework is developed to specify the correlations along the frequency and time axes and avoid the disadvantages of the time–frequency domain. Experimental results show that the proposed approach outperforms several competing speech enhancement methods and achieves the highest PESQ, Cov and STOI under different types of noise and at low SNR values in the two datasets. The proposed model is tested on a hardware-level manual design to accelerate the execution of the developed deep learning model on an FPGA. |
---|---|
ISSN: | 0278-081X 1531-5878 |
DOI: | 10.1007/s00034-024-02606-4 |