Loading…

Speaker and Session Variability in GMM-Based Speaker Verification

We present a corpus-based approach to speaker verification in which maximum-likelihood II criteria are used to train a large-scale generative model of speaker and session variability which we call joint factor analysis. Enrolling a target speaker consists in calculating the posterior distribution of...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on audio, speech, and language processing speech, and language processing, 2007-05, Vol.15 (4), p.1448-1460
Main Authors:	Kenny, P., Boulianne, G., Ouellet, P., Dumouchel, P.
Format:	Article
Language:	English
Subjects:	Adaptation model Applied sciences Availability Error analysis Exact sciences and technology Factor analysis Gaussian mixture Information, signal and communications theory Large-scale systems Mathematical models Natural language processing NIST Performance analysis Performance evaluation Signal processing Speaker recognition speaker verification Speech processing Speech recognition Statistical analysis Statistical distributions Statistics Studies Telecommunications and information theory Testing Trains
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	We present a corpus-based approach to speaker verification in which maximum-likelihood II criteria are used to train a large-scale generative model of speaker and session variability which we call joint factor analysis. Enrolling a target speaker consists in calculating the posterior distribution of the hidden variables in the factor analysis model and verification tests are conducted using a new type of likelihood II ratio statistic. Using the NIST 1999 and 2000 speaker recognition evaluation data sets, we show that the effectiveness of this approach depends on the availability of a training corpus which is well matched with the evaluation set used for testing. Experiments on the NIST 1999 evaluation set using a mismatched corpus to train factor analysis models did not result in any improvement over standard methods, but we found that, even with this type of mismatch, feature warping performs extremely well in conjunction with the factor analysis model, and this enabled us to obtain very good results (equal error rates of about 6.2%)
ISSN:	1558-7916 2329-9290 1558-7924 2329-9304
DOI:	10.1109/TASL.2007.894527