[Gnome-OCR] Integration of Tesseract-OCR... (fwd)

From: Alan Horkan <horkana_at_maths.tcd.ie>
Date: Mon Nov 27 2006 - 12:55:54 CET

Forwarded from the Gnome Destkop Developer mailing list.

I expect you the abiword developers will express the usual enthusiasm for
this and efforts to help improve abiword. The lecturer has good plans but
a cautious outlook and he recognises it might be in the longer term
before this becomes relevant to Abiword but still we can offer
encouragement until then.

Sincerely

Alan Horkan

http://advogato.org/person/AlanHorkan/
http://alanhorkan.livejournal.com/

---------- Forwarded message ----------
Date: Mon, 27 Nov 2006 10:55:06 +0100
From: Emmanuel Fleury <fleury@labri.fr>
To: desktop-devel-list@gnome.org
Subject: [Gnome-OCR] Integration of Tesseract-OCR...

Hi all,

As I'm pretty new here just forgive me if I'm not at the right place. :)

I am associate professor at Bordeaux-I university (France) and I have
submitted a project for students about integrating tesseract-OCR in
Gnome (the student project starts only in January).

My plan is more or less to:
- make them develop a libgnome-ocr as wrapper to tesseract-OCR,
- clean the code of tesseract-OCR,
- refactor tesseract-OCR within the Gnome libs and,
- try to add some extra features.

(I think the students will stop at the first item but we never know !)

I don't know exactly what should be the API and what could be the usage
of such library but with the help of Étienne Bersac (the author of
libgnome-scan) I though about few examples:
- A plug-in for Abiword (outputting also formatting informations);
- A plug-in for e-mail readers (image spam analysis);
- ... and so forth ...

I guess, that the API should include an initialization function, setting
the image input format, the output format plus some settings
(recognition strategy, drawing recognition, where to store the output, etc).

For now, waiting for the start of the project in January I'm trying to
port Tesseract-OCR to 64bits plate-forms... and I'm a bit horrified by
the way they handle basic types and data-structures... My guess is that
a lot of cleaning is needed there. :-/

Anyway, is this project interesting for the Gnome community, would you
have comments, advices or objections ?

Regards

-- 
Emmanuel Fleury              | Office: 261
Associate Professor,         | Phone: +33 (0)5 40 00 69 34
LaBRI, Domaine Universitaire | Fax:   +33 (0)5 40 00 66 69
351, Cours de la Libération  | email: emmanuel.fleury@labri.fr
33405 Talence Cedex, France  | URL: http://www.labri.fr/~fleury
Received on Mon Nov 27 12:58:45 2006

This archive was generated by hypermail 2.1.8 : Mon Nov 27 2006 - 12:58:46 CET