Perceptually non-uniform spectral compression for noisy speech recognition

Loudness is a function of sound pressure level. The power law used in approximating the loudness function has an exponent that depends on the bandwidth of the sound signal. This exponent decreases from about 0.3 for a narrow band tone to 0.23 for a broadband uniform-exciting noise. Exploiting this p...

Full description

Saved in:
Bibliographic Details
Main Authors: Chu, K.K., Leung, S.H., Yip, C.S.
Format: Conference Proceeding
Language:eng
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Loudness is a function of sound pressure level. The power law used in approximating the loudness function has an exponent that depends on the bandwidth of the sound signal. This exponent decreases from about 0.3 for a narrow band tone to 0.23 for a broadband uniform-exciting noise. Exploiting this property of psychoacoustics of hearing, this paper proposes a new feature extraction method for robust speech recognition for FFT-based methods. In the method, larger energy compression is applied to broadband-like high frequency bands of the power spectrum of each frame, instead of a fixed compression for all frequency bands as in root cepstral analysis or perceptually based linear prediction (PLP). Further to this, those sound segments or frames having broadband characteristics like those of fricatives are given larger compression as well. The frame energy is used as the index to determine the degree of compression. By using this new scheme of non-uniform spectral compression, significant improvement in recognition accuracy is obtained, especially in very low SNR, under white noise environment.
ISSN:1520-6149
2379-190X