Eduardo

2012/5/23 Guillaume Lazzara <lazzara@lrde.epita.fr>

Dear Eduardo.

On 05/14/2012 07:53 PM, Eduardo Basterrechea wrote:
> I read the web, but I can't find info about if you use linguistics data
> in OCR, and if you can OCR spanish texts.
>
> Thanks for the project it seems to be a great product !

Thanks for your interest in our project!
Regarding the Scribo module, its main task is to detect and extract
structure and data in documents.

We perform image processing treatments on images and try to OCR detected
text regions. For OCR, we use the open source project Tesseract which
supports many languages, including Spanish.

In Scribo, functions calling the OCR let the user choose which language
to use for recognition. For Spanish you shall have to use "spa" as argument.

For the moment, we do not use any other linguistics data in OCR such as
dictionnary or semantic post-processing to improve results.

Let us know if you need more information.

Best regards,

--
Guillaume

Eduardo Basterrechea Molina

@ebaste

Fundador y CEO

Nanclares de Oca 1F Bajo F

28022 Madrid

Tel: +34 91 3292318

ebaste@molinodeideas.es

Molino de ideas

Onoma

Molinolabs

Gominolabs

Molinarium

Refranario

Dictio

Fonemolabs

Face-Molino

@Molinodeideas