Re: Filters [Was Re: AbiWord 0.3.4]

Justin Bradford (justin@ukans.edu)
Thu, 25 Feb 1999 11:44:57 -0600 (CST)


> I was thinking, rather than putting a whole lot of effort into writing
> filters, why not write a simple description language for converting
> files. (Or even use part of an existing one, eg. based on the regex
> library)
...
> But, then again, I don't know how you'd tackle the extremely binary file
> formats with embedded objects in them :(

Most of the file formats are too complicated to use a description
language.
Maybe something like that would work for HTML and RTF, since RTF is
similar in design to HTML. Both have what basically amount to tags
surrounding the text.

Word 8 requires 500Kb of text and tables just to describe, and that
doesn't even include the OLE2 storage object format which it is stored in.
A template won't work for it.

And speaking of Word 8 files, I'm adapting the MSWordview program to
become a library. Then a relatively small function (<16,000 lines) can be
written to call those functions to extract the information, run a loop
through all of the pieces, and then code to deal with the various types of
things it encounters (control characters, footnotes, tables, objects, etc)
as it goes through the file.

Justin Bradford
justin@ukans.edu



This archive was generated by hypermail 1.03b2.