Re: Commit: PDF import plugin

From: Dom Lachowicz <domlachowicz_at_yahoo.com>
Date: Fri Mar 18 2005 - 00:49:46 CET

> I'm just wondering if it would be valid to ask the
> Poppler user
> community to help develop PDF->RTF, instead of just
> you (or other
> AbiFolk) working on making the PDF->TXT support what
> we want or
> developing a custom PDF->ABW.

I don't think that the Poppler folks are interested in
this use-case, other than "wouldn't it be cool if
someone else did it". But I could be wrong if someone
wants to ask them. I can see the KWord and OOo teams
caring, though.

As for mapping PDF to RTF, PDF doesn't have any
semantic information to speak of that word processor
formats care about. It's just a vector graphic that
uses a brain-damaged format. Not that SVG is all that
much saner...

The PDF->TEXT converter tried to reconstruct at least
the logical text order. I can certainly help produce a
PDF->RTF converter by stealing liberally from the
PDF->TEXT converter. The KWord folk would probably be
interested in it. With a bit of sneaky-ness, it should
be possible to largely preserve the following:

*) Font names
*) Font sizes
*) Font styles (bold, italic)
*) Images
*) Certain annotations, preserved as RTF comments
*) Colors

By doing multiple passes + some hueristics, it may be
possible to reconstruct some semantic layouts (eg
columns or sections), but I don't imagine this
happening anytime soon.

Best,
Dom

                
__________________________________
Do you Yahoo!?
Yahoo! Small Business - Try our new resources site!
http://smallbusiness.yahoo.com/resources/
Received on Fri Mar 18 00:50:27 2005

This archive was generated by hypermail 2.1.8 : Fri Mar 18 2005 - 00:50:27 CET