Loading…

Rapid unsupervised adaptation to children's speech on a connected-digit task

We are exploring ways in which to rapidly adapt our neural network classifiers to new speakers and conditions using very small amounts of speech, say, one or a few words. Our approach is to perform a speaker-dependent warping of the frequency scale by selecting a Bark offset for each speaker. We cho...

Full description

Saved in:
Bibliographic Details
Main Authors: Burnett, D.C., Fanty, M.
Format: Conference Proceeding
Language:English
Subjects:
Citations: Items that cite this one
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:We are exploring ways in which to rapidly adapt our neural network classifiers to new speakers and conditions using very small amounts of speech, say, one or a few words. Our approach is to perform a speaker-dependent warping of the frequency scale by selecting a Bark offset for each speaker. We choose the offset for a speaker to be the one that maximizes our recognizer output score on the adaptation utterance. We then use the speaker's offset during evaluation of all other utterances by the speaker. To test our approach, we evaluate an adult-speech trained recognizer on children's speech from the same task both before and after adaptation to each child's voice. Using only a single digit for adaptation, we have reduced the word error rate for children's speech from 9.6% to 4.2%. Using a seven-digit utterance further reduced the error rate to 3.5%.
DOI:10.1109/ICSLP.1996.607809