Most of the file formats are too complicated to use a description
language.
Maybe something like that would work for HTML and RTF, since RTF is
similar in design to HTML. Both have what basically amount to tags
surrounding the text.
Word 8 requires 500Kb of text and tables just to describe, and that
doesn't even include the OLE2 storage object format which it is stored in.
A template won't work for it.
And speaking of Word 8 files, I'm adapting the MSWordview program to
become a library. Then a relatively small function (<16,000 lines) can be
written to call those functions to extract the information, run a loop
through all of the pieces, and then code to deal with the various types of
things it encounters (control characters, footnotes, tables, objects, etc)
as it goes through the file.
Justin Bradford
justin@ukans.edu