Loading…

On Learning Spectral Masking for Single Channel Speech Enhancement Using Feedforward and Recurrent Neural Networks

Human speech in real-world environments is typically degraded by the background noise. They have a negative impact on perceptual speech quality and intelligibility which causes performance degradation in various speech-related technological applications, such as hearing aids and automatic speech rec...

Full description

Saved in:
Bibliographic Details
Published in:IEEE access 2020, Vol.8, p.160581-160595
Main Authors: Saleem, Nasir, Khattak, Muhammad Irfan, Al-Hasan, Muath, Qazi, Abdul Baseer
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Human speech in real-world environments is typically degraded by the background noise. They have a negative impact on perceptual speech quality and intelligibility which causes performance degradation in various speech-related technological applications, such as hearing aids and automatic speech recognition systems. It also degrades the original phase of the clean speech and introduces perceptual disturbance which leads to the negative impacts on the quality of speech. Therefore, speech enhancement must vigilantly be dealt with in everyday listening environments. In this article, speech enhancement is performed using supervised learning of spectral masking. Deep neural networks (DNN) and recurrent neural networks (RNN) are trained to learn the spectral masking from the magnitude spectrograms of the degraded speech. An iterative procedure is adopted as a post-processing step to deal with the noisy phase. Additionally, an intelligibility improvement filter is also used to incorporate the critical band importance function weights where higher weights contribute more towards intelligibility. Systematic experiments demonstrated that the proposed approaches greatly attenuated the background noise. Also, they led to large improvements of the perceived speech quality and intelligibility, as well as automatic speech recognition. In experiments, TIMIT database is used. The STOI is improved by 17.6% over the noisy speech. Also, SDR and PESQ are improved by 5.22dB and 19% over the noisy speech utterances. These comparisons showed that the proposed speech enhancement approaches outperformed the related speech enhancement methods.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2020.3021061