Loading…

Statistical Machine Translation as a Language Model for Handwriting Recognition

When performing handwriting recognition on natural language text, the use of a word-level language model (LM) is known to significantly improve recognition accuracy. The most common type of language model, the n-gram model, decomposes sentences into short, overlapping chunks. In this paper, we propo...

Full description

Saved in:

Bibliographic Details
Main Authors:	Devlin, J., Kamali, M., Subramanian, K., Prasad, R., Natarajan, P.
Format:	Conference Proceeding
Language:	English
Subjects:	Buildings Computational modeling Handwriting recognition Hidden Markov models machine translation Optical character recognition software Training Viterbi algorithm
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	When performing handwriting recognition on natural language text, the use of a word-level language model (LM) is known to significantly improve recognition accuracy. The most common type of language model, the n-gram model, decomposes sentences into short, overlapping chunks. In this paper, we propose a new type of language model which we use in addition to the standard n-gram LM. Our new model uses the likelihood score from a statistical machine translation system as a reranking feature. In general terms, we automatically translate each OCR hypothesis into another language, and then create a feature score based on how "difficult" it was to perform the translation. Intuitively, the difficulty of translation correlates with how well-formed the input sentence is. In an Arabic handwriting recognition task, we were able to obtain an 0.4% absolute improvement to word error rate (WER) on top of a powerful 5-gram LM.
DOI:	10.1109/ICFHR.2012.273