Loading…

Eigenvoice modeling with sparse training data

We derive an exact solution to the problem of maximum likelihood estimation of the supervector covariance matrix used in extended MAP (or EMAP) speaker adaptation and show how it can be regarded as a new method of eigenvoice estimation. Unlike other approaches to the problem of estimating eigenvoice...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on speech and audio processing 2005-05, Vol.13 (3), p.345-354
Main Authors:	Kenny, P., Boulianne, G., Dumouchel, P.
Format:	Article
Language:	English
Subjects:	Applied sciences Cluster adaptive training Clusters Covariance matrix Eigenvalues and eigenfunctions eigenvoices Equivalence Exact sciences and technology Exact solutions extended MAP (EMAP) H infinity control Hidden Markov models Infinity Information, signal and communications theory Loudspeakers Mathematical analysis Mathematical models Maximum likelihood estimation Principal component analysis Signal processing speaker adaptation Speech Speech processing Speech recognition Telecommunications and information theory Testing Training Training data
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by	cdi_FETCH-LOGICAL-c381t-7bf679c05ec39add694f012181b6678850c59285ee63829492b9cb762cc1630e3
cites	cdi_FETCH-LOGICAL-c381t-7bf679c05ec39add694f012181b6678850c59285ee63829492b9cb762cc1630e3
container_end_page	354
container_issue	3
container_start_page	345
container_title	IEEE transactions on speech and audio processing
container_volume	13
creator	Kenny, P. Boulianne, G. Dumouchel, P.
description	We derive an exact solution to the problem of maximum likelihood estimation of the supervector covariance matrix used in extended MAP (or EMAP) speaker adaptation and show how it can be regarded as a new method of eigenvoice estimation. Unlike other approaches to the problem of estimating eigenvoices in situations where speaker-dependent training is not feasible, our method enables us to estimate as many eigenvoices from a given training set as there are training speakers. In the limit as the amount of training data for each speaker tends to infinity, it is equivalent to cluster adaptive training.
doi_str_mv	10.1109/TSA.2004.840940
format	article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TSA_2004_840940</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>1420369</ieee_id><sourcerecordid>919934512</sourcerecordid><originalsourceid>FETCH-LOGICAL-c381t-7bf679c05ec39add694f012181b6678850c59285ee63829492b9cb762cc1630e3</originalsourceid><addsrcrecordid>eNp9kM1LAzEQxYMoWKtnD16KoJ62nXxuciylfkDBg_UcstlsTdnu1mSr-N-bsoWCB08zzPzeMO8hdI1hjDGoyfJtOiYAbCwZKAYnaIA5lxmhnJ6mHgTNhMjFObqIcQ0AEudsgLK5X7nmq_XWjTZt6WrfrEbfvvsYxa0J0Y26YHyzH5amM5forDJ1dFeHOkTvj_Pl7DlbvD69zKaLzFKJuywvKpErC9xZqkxZCsUqwARLXKQXpORguSKSOyeoJIopUihb5IJYiwUFR4foob-7De3nzsVOb3y0rq5N49pd1AorRRnHJJH3_5JEAmAp8wTe_gHX7S40yYWWklJFKIUETXrIhjbG4Cq9DX5jwo_GoPcp65Sy3qes-5ST4u5w1kRr6iqYxvp4lInkXeQ8cTc9551zxzUjQIWivx-kgiw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>883392330</pqid></control><display><type>article</type><title>Eigenvoice modeling with sparse training data</title><source>IEEE Electronic Library (IEL) Journals</source><creator>Kenny, P. ; Boulianne, G. ; Dumouchel, P.</creator><creatorcontrib>Kenny, P. ; Boulianne, G. ; Dumouchel, P.</creatorcontrib><description>We derive an exact solution to the problem of maximum likelihood estimation of the supervector covariance matrix used in extended MAP (or EMAP) speaker adaptation and show how it can be regarded as a new method of eigenvoice estimation. Unlike other approaches to the problem of estimating eigenvoices in situations where speaker-dependent training is not feasible, our method enables us to estimate as many eigenvoices from a given training set as there are training speakers. In the limit as the amount of training data for each speaker tends to infinity, it is equivalent to cluster adaptive training.</description><identifier>ISSN: 1063-6676</identifier><identifier>ISSN: 2329-9290</identifier><identifier>EISSN: 1558-2353</identifier><identifier>EISSN: 2329-9304</identifier><identifier>DOI: 10.1109/TSA.2004.840940</identifier><identifier>CODEN: IESPEJ</identifier><language>eng</language><publisher>New York, NY: IEEE</publisher><subject>Applied sciences ; Cluster adaptive training ; Clusters ; Covariance matrix ; Eigenvalues and eigenfunctions ; eigenvoices ; Equivalence ; Exact sciences and technology ; Exact solutions ; extended MAP (EMAP) ; H infinity control ; Hidden Markov models ; Infinity ; Information, signal and communications theory ; Loudspeakers ; Mathematical analysis ; Mathematical models ; Maximum likelihood estimation ; Principal component analysis ; Signal processing ; speaker adaptation ; Speech ; Speech processing ; Speech recognition ; Telecommunications and information theory ; Testing ; Training ; Training data</subject><ispartof>IEEE transactions on speech and audio processing, 2005-05, Vol.13 (3), p.345-354</ispartof><rights>2005 INIST-CNRS</rights><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2005</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c381t-7bf679c05ec39add694f012181b6678850c59285ee63829492b9cb762cc1630e3</citedby><cites>FETCH-LOGICAL-c381t-7bf679c05ec39add694f012181b6678850c59285ee63829492b9cb762cc1630e3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/1420369$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>315,786,790,27957,27958,55147</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=16694675$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>Kenny, P.</creatorcontrib><creatorcontrib>Boulianne, G.</creatorcontrib><creatorcontrib>Dumouchel, P.</creatorcontrib><title>Eigenvoice modeling with sparse training data</title><title>IEEE transactions on speech and audio processing</title><addtitle>T-SAP</addtitle><description>We derive an exact solution to the problem of maximum likelihood estimation of the supervector covariance matrix used in extended MAP (or EMAP) speaker adaptation and show how it can be regarded as a new method of eigenvoice estimation. Unlike other approaches to the problem of estimating eigenvoices in situations where speaker-dependent training is not feasible, our method enables us to estimate as many eigenvoices from a given training set as there are training speakers. In the limit as the amount of training data for each speaker tends to infinity, it is equivalent to cluster adaptive training.</description><subject>Applied sciences</subject><subject>Cluster adaptive training</subject><subject>Clusters</subject><subject>Covariance matrix</subject><subject>Eigenvalues and eigenfunctions</subject><subject>eigenvoices</subject><subject>Equivalence</subject><subject>Exact sciences and technology</subject><subject>Exact solutions</subject><subject>extended MAP (EMAP)</subject><subject>H infinity control</subject><subject>Hidden Markov models</subject><subject>Infinity</subject><subject>Information, signal and communications theory</subject><subject>Loudspeakers</subject><subject>Mathematical analysis</subject><subject>Mathematical models</subject><subject>Maximum likelihood estimation</subject><subject>Principal component analysis</subject><subject>Signal processing</subject><subject>speaker adaptation</subject><subject>Speech</subject><subject>Speech processing</subject><subject>Speech recognition</subject><subject>Telecommunications and information theory</subject><subject>Testing</subject><subject>Training</subject><subject>Training data</subject><issn>1063-6676</issn><issn>2329-9290</issn><issn>1558-2353</issn><issn>2329-9304</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2005</creationdate><recordtype>article</recordtype><recordid>eNp9kM1LAzEQxYMoWKtnD16KoJ62nXxuciylfkDBg_UcstlsTdnu1mSr-N-bsoWCB08zzPzeMO8hdI1hjDGoyfJtOiYAbCwZKAYnaIA5lxmhnJ6mHgTNhMjFObqIcQ0AEudsgLK5X7nmq_XWjTZt6WrfrEbfvvsYxa0J0Y26YHyzH5amM5forDJ1dFeHOkTvj_Pl7DlbvD69zKaLzFKJuywvKpErC9xZqkxZCsUqwARLXKQXpORguSKSOyeoJIopUihb5IJYiwUFR4foob-7De3nzsVOb3y0rq5N49pd1AorRRnHJJH3_5JEAmAp8wTe_gHX7S40yYWWklJFKIUETXrIhjbG4Cq9DX5jwo_GoPcp65Sy3qes-5ST4u5w1kRr6iqYxvp4lInkXeQ8cTc9551zxzUjQIWivx-kgiw</recordid><startdate>20050501</startdate><enddate>20050501</enddate><creator>Kenny, P.</creator><creator>Boulianne, G.</creator><creator>Dumouchel, P.</creator><general>IEEE</general><general>Institute of Electrical and Electronics Engineers</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7SP</scope><scope>F28</scope><scope>FR3</scope></search><sort><creationdate>20050501</creationdate><title>Eigenvoice modeling with sparse training data</title><author>Kenny, P. ; Boulianne, G. ; Dumouchel, P.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c381t-7bf679c05ec39add694f012181b6678850c59285ee63829492b9cb762cc1630e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2005</creationdate><topic>Applied sciences</topic><topic>Cluster adaptive training</topic><topic>Clusters</topic><topic>Covariance matrix</topic><topic>Eigenvalues and eigenfunctions</topic><topic>eigenvoices</topic><topic>Equivalence</topic><topic>Exact sciences and technology</topic><topic>Exact solutions</topic><topic>extended MAP (EMAP)</topic><topic>H infinity control</topic><topic>Hidden Markov models</topic><topic>Infinity</topic><topic>Information, signal and communications theory</topic><topic>Loudspeakers</topic><topic>Mathematical analysis</topic><topic>Mathematical models</topic><topic>Maximum likelihood estimation</topic><topic>Principal component analysis</topic><topic>Signal processing</topic><topic>speaker adaptation</topic><topic>Speech</topic><topic>Speech processing</topic><topic>Speech recognition</topic><topic>Telecommunications and information theory</topic><topic>Testing</topic><topic>Training</topic><topic>Training data</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kenny, P.</creatorcontrib><creatorcontrib>Boulianne, G.</creatorcontrib><creatorcontrib>Dumouchel, P.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEL</collection><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Electronics & Communications Abstracts</collection><collection>ANTE: Abstracts in New Technology & Engineering</collection><collection>Engineering Research Database</collection><jtitle>IEEE transactions on speech and audio processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kenny, P.</au><au>Boulianne, G.</au><au>Dumouchel, P.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Eigenvoice modeling with sparse training data</atitle><jtitle>IEEE transactions on speech and audio processing</jtitle><stitle>T-SAP</stitle><date>2005-05-01</date><risdate>2005</risdate><volume>13</volume><issue>3</issue><spage>345</spage><epage>354</epage><pages>345-354</pages><issn>1063-6676</issn><issn>2329-9290</issn><eissn>1558-2353</eissn><eissn>2329-9304</eissn><coden>IESPEJ</coden><notes>ObjectType-Article-2</notes><notes>SourceType-Scholarly Journals-1</notes><notes>ObjectType-Feature-1</notes><notes>content type line 23</notes><abstract>We derive an exact solution to the problem of maximum likelihood estimation of the supervector covariance matrix used in extended MAP (or EMAP) speaker adaptation and show how it can be regarded as a new method of eigenvoice estimation. Unlike other approaches to the problem of estimating eigenvoices in situations where speaker-dependent training is not feasible, our method enables us to estimate as many eigenvoices from a given training set as there are training speakers. In the limit as the amount of training data for each speaker tends to infinity, it is equivalent to cluster adaptive training.</abstract><cop>New York, NY</cop><pub>IEEE</pub><doi>10.1109/TSA.2004.840940</doi><tpages>10</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 1063-6676
ispartof	IEEE transactions on speech and audio processing, 2005-05, Vol.13 (3), p.345-354
issn	1063-6676 2329-9290 1558-2353 2329-9304
language	eng
recordid	cdi_crossref_primary_10_1109_TSA_2004_840940
source	IEEE Electronic Library (IEL) Journals
subjects	Applied sciences Cluster adaptive training Clusters Covariance matrix Eigenvalues and eigenfunctions eigenvoices Equivalence Exact sciences and technology Exact solutions extended MAP (EMAP) H infinity control Hidden Markov models Infinity Information, signal and communications theory Loudspeakers Mathematical analysis Mathematical models Maximum likelihood estimation Principal component analysis Signal processing speaker adaptation Speech Speech processing Speech recognition Telecommunications and information theory Testing Training Training data
title	Eigenvoice modeling with sparse training data
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-09-23T02%3A32%3A32IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Eigenvoice%20modeling%20with%20sparse%20training%20data&rft.jtitle=IEEE%20transactions%20on%20speech%20and%20audio%20processing&rft.au=Kenny,%20P.&rft.date=2005-05-01&rft.volume=13&rft.issue=3&rft.spage=345&rft.epage=354&rft.pages=345-354&rft.issn=1063-6676&rft.eissn=1558-2353&rft.coden=IESPEJ&rft_id=info:doi/10.1109/TSA.2004.840940&rft_dat=%3Cproquest_cross%3E919934512%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c381t-7bf679c05ec39add694f012181b6678850c59285ee63829492b9cb762cc1630e3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=883392330&rft_id=info:pmid/&rft_ieee_id=1420369&rfr_iscdi=true