SNR-dependent non-uniform spectral compression for noisy speech recognition

It is known that the perceived loudness of a tone signal by a human is spectrally masked by background noise. This masking effect causes not only a shift of just-audible sound pressure level of the tone, but also produces a masked loudness function having steeper slope than the unmasked one. This ma...

Full description

Saved in:

Bibliographic Details
Main Authors:	Chu, K.K., Leung, S.H.
Format:	Conference Proceeding
Language:	eng
Subjects:	Acoustic noise Applied sciences Background noise Band pass filters Cepstral analysis Detection, estimation, filtering, equalization, prediction Exact sciences and technology Feature extraction Filtering Humans Information, signal and communications theory Mel frequency cepstral coefficient Signal and communications theory Signal processing Signal, noise Speech processing Speech recognition Telecommunications and information theory Working environment noise
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	It is known that the perceived loudness of a tone signal by a human is spectrally masked by background noise. This masking effect causes not only a shift of just-audible sound pressure level of the tone, but also produces a masked loudness function having steeper slope than the unmasked one. This masking property of perceived loudness stimulates us to propose a new mel-scale-based feature extraction method with non-uniform spectral compression for speech recognition in noisy environments. In this method, the speech power spectrum is to undergo mel-scaled band-pass filtering, as in the standard MFCC front-end. However, the energies of the outputs of the filters are compressed by different root values defined by a compression function. This compression function is a function of the SNR in each filter band. Using this new scheme of SNR-dependent non-uniform spectral compression (SNSC) for mel-scaled filter-bank-based cepstral coefficients, substantial improvement is found for recognition in different noisy environments, as compared to the standard MFCC and features derived with cubic root spectral compression.
ISSN:	1520-6149 2379-190X