Loading…
Trainable table location in document images
We describe an approach for table location in document images. The documents are described by means of a hierarchical representation that is based on the MXY tree. The presence of a table is hypothesized by searching parallel lines in the MXY tree of the page. This hypothesis is afterwards verified...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Citations: | Items that cite this one |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | We describe an approach for table location in document images. The documents are described by means of a hierarchical representation that is based on the MXY tree. The presence of a table is hypothesized by searching parallel lines in the MXY tree of the page. This hypothesis is afterwards verified by locating perpendicular lines or white spaces in the region included between the parallel lines. Lastly, located tables can be merged on the basis of proximity and similarity criteria. The use of an optimization method, that relies on the definition of an appropriate table location index, allows us to identify, the optimal values of thresholds involved in the algorithm. In this way the algorithm can be adapted to recognize tables with different features by maximizing the performance on an appropriate training set. The algorithm has been evaluated on two data-sets containing more than 1500 pages, and comparing its results with the tables identified by two commercial OCRs. |
---|---|
ISSN: | 1051-4651 2831-7475 |
DOI: | 10.1109/ICPR.2002.1047838 |