Loading…

Sub-word Based End-to-End Speech Recognition for an Under-Resourced Language: Amharic

In this work, we focused on end-to-end speech recognition for less-resourced language, Amharic. The result can be integrated with other tasks such as spoken content retrieval. We explored three models, which consist of Convolutional Neural Networks, Recurrent Neural Networks, and Connectionist Tempo...

Full description

Saved in:
Bibliographic Details
Main Authors: Gebreegziabher, Nirayo Hailu, Nurnberger, Andreas
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In this work, we focused on end-to-end speech recognition for less-resourced language, Amharic. The result can be integrated with other tasks such as spoken content retrieval. We explored three models, which consist of Convolutional Neural Networks, Recurrent Neural Networks, and Connectionist Temporal Classification, towards end-to-end speech recognition on less-resourced language. Further, we studied the possibility of having an end-to-end system with 1-best output keeping the network parameters and computational resource minimal. The paper gives attention to finding a more suitable sub-lexical unit for the Amharic end-to-end speech recognition system which can be used as an audio indexing unit. We present the first result comparing grapheme, phoneme, and syllable-based end-to-end speech recognition systems for our target language. The models are evaluated on approximately 52 hours of Amharic speech corpus containing read-speech, audiobooks, and multi-genre radio programs. On the test set, we report a character error rate (CER) of 19.21% and a syllable error rate (SER) of 39.98% for a syllable-based end-to-end model without lexicons and language model integrated.
ISSN:2577-1655
DOI:10.1109/SMC42975.2020.9283401