We are happy to announce that the following paper has been accepted
for publication at the 11th International Conference on Document
Analysis and Recognition (ICDAR) that will take place at Beijing, China,
on September 18 - 21, 2011:
Guillaume Lazzara (1), Roland Levillain (1), Thierry Géraud (1),
Yann Jacquelet (1), Julien Marquegnies (1), Arthur Crépin-Leblond (1)
A Free Software Framework
for Document Image Analysis
http://publis.lrde.epita.fr/201109-ICDAR
(1) EPITA Research and Development Laboratory (LRDE)
Electronic documents are being more and more usable tanks to better and
more affordable network, storage and computational equipment. But in
order to benefit from computer-aided document management, paper
documents must be digitized and analyzed. This task may be challenging
at several levels. Data may be of multiple types thus requiring
different adapted processing chains. The tools to be developed should
also take into account the needs and knowledge of users, ranging from a
simple graphical application to a complete programming framework.
Finally, the data sets to process may be large. In this paper, we expose
a set of features that a Document Image Analysis framework should
provide to address the previous issues. These ideas are implemented as
an open source module built on top of a generic and efficient image
processing platform. Our solution features services such as
preprocessing filters, text detection, page segmentation and document
reconstruction (as XML, PDF or HTML documents). This framework, composed
of reusable software components, can be used to create full-fledged
graphical applications, small utilities, or processing chains to be
integrated into third-party projects.