Loading…

Printed and scanned document authentication using robust layout descriptor matching

Automatic document authentication is a complex task. The aim is to prove that the document at hand is not a fraudulent one. This can be achieved through a fingerprint that is based on the document’s content. To this end, it is necessary to analyze and describe the different constituent elements of t...

Full description

Saved in:
Bibliographic Details
Published in:Multimedia tools and applications 2023-10, Vol.83 (16), p.47477-47502
Main Authors: Gomez-Krämer, Petra, Rouis, Kais, Diallo, Azise Oumar, Coustaty, Mickaël
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Automatic document authentication is a complex task. The aim is to prove that the document at hand is not a fraudulent one. This can be achieved through a fingerprint that is based on the document’s content. To this end, it is necessary to analyze and describe the different constituent elements of the document: graphics, text, tables, as well as the layout. In this context, this article focuses on layout description and authentication. The Delaunay layout descriptor Eskenazi et al. 2015 is a robust descriptor allowing the fast comparison and authentication of layouts based on the spatial relationships of the regions composing the document. As the page layout description needs a segmentation of the document into regions, the Delaunay layout descriptor does not allow to match an authentic copy with the original when the number of segmented regions is different for both documents. This is mainly due to the use of a global matching approach. To overcome this drawback, we present a new refined matching algorithm for the Delaunay layout descriptor, which combines global and local matching. Furthermore, we present a storage and retrieval scheme to match a Delaunay layout descriptor efficiently with a layout database. In addition to its ability of comparing layouts with a different number of segmented regions, the proposed method outperforms related work. We obtain respectively a false negative and false positive rate of 0.011 and 0.0 for a data set of printed and scanned layouts, and of 0.3978 and 0.0029 for a data set of real documents.
ISSN:1573-7721
1380-7501
1573-7721
DOI:10.1007/s11042-023-17021-1