Re: Word Exporter Project


Subject: Re: Word Exporter Project
From: Justin Bradford (justin@ukans.edu)
Date: Thu Mar 02 2000 - 15:18:05 CST


> Could someone with access to the Word2000 UI confirm that documents are
> saved by default in the binary format? I suspect that they still are, and
> this HTML mess is a secondary alternative. However, it'd be good to know
> for sure.

Binary by default. The HTML mess is part of their web-integration
(assimilation) effort. The idea being that one could write a document put
it on the web (intranet), and them someone else could come along grab this
document, import it into Word and then keep all of the "advanced"
formatting/style info. Basically, it's to make doc->html->doc lossless.

> In many ways, writing the exporter should be easier than the importer, since
> we don't have to skip over everything we don't understand. We can just
> write out a clean document using the features we do support.
>
> I don't mean to trivialize how large a job this is, but we're already got a
> big head start.

Yes, an exporter would not be too difficult at this point. Time-consuming,
but if you know the format, it's fairly straight-forward using the wv
framework.

> You proposed an exporter to an unusual new text-only format, which doesn't
> help with the binary round-trip problem. Or were you also thinking of
> starting a project to import this mess, too (to solve that round-trip
> problem)?

True, the Word "XML" format is really not any better than the RTF format,
as far as going from Abiword to Word is concerned. This defintely does not
replace the need for a binary exporter, but it still doesn't mean it's a
bad idea. As Word 2000 usage increases, support for this format could
become increasingly important. However, when considering how it will
likely be used by Word users (publishing to intra/internet), import of the
format is probably more important than export.

There are probably more "important" things to work on, but it's not like
supporting this format is bad. Maybe it will be a useful introduction to
working on the binary format.

And on the binary export front, there's a lot to learn, but someone could
begin writing "write" functions to complement the "read" functions in wv
without too much introduction. Writing a CHP, PAP, and SEP compressor
wouldn't be too bad, either. Also, modifiying wv to abstract
reading/writing and modularize the OLE2 code (and switch it to
libole2) requires practically no knowledge of the actual format. It's
mostly just tedious.

I'm going to write this stuff eventually, but until I have time, I'd be
more than happy to point people to the relevant bits of documentation and
code.

Justin



This archive was generated by hypermail 2b25 : Thu Mar 02 2000 - 15:18:12 CST