Re: Word Exporter Project


Subject: Re: Word Exporter Project
From: Paul Rohr (paul@abisource.com)
Date: Thu Mar 02 2000 - 14:44:33 CST


At 01:50 PM 3/2/00 -0600, sam th wrote:
>I appear to have failed to make one important fact clear - I HATE this
>format. It looks bad, it breaks lots of different standars that I like,
>and it's just plain UGLY. But, shockingly enough, it's open. Not only is
>a text format (it is HTML) but MSFT has even published the pseudo-DTD.

Oops. MSFT dangled some ugly bait under a pretty name, and you bit. (Don't
be ashamed. Over the past decade, it's happened to almost everyone in the
industry. When I put my marketing hat on, I'm totally impressed with how
good MSFT is at this.)

What's so "open" about the following story?

  - Hi, we're MSFT, the world's biggest baddest software company.
  - We introduce new, incompatible file formats with every release.
  - Everyone *has* to upgrade to read those documents.
  - Each of those formats is uglier than the previous one.
  - Oh yeah, here are DTDs (no code) for our latest ugly format.

What are they telling the world about compatibility? Here's this atrocious
new file format that everyone we already tricked into upgrading can use.
Everyone else can go suck eggs. Yeah, it's text-based rather than binary
(which should make it somewhat easier to implement, especially if the
documentation is accurate), but who really benefits from this vaunted new
"open" compatibility?

  - people who paid for Office 2000? yes
  - users of every other version of Word to date? nope
  - users of every other existing word processor? nope

  - Microsoft? definitely, they sell a bunch of upgrades
  - every other word processor author? no

Sorry, that's not the kind of "openness" I want to endorse. If they'd done
the work to create a *clean* format (like .ABW for instance ;-), that might
be a different story -- especially if they gave away all the necessary code
to implement that format (like Caolan's doing with libwv, and we're doing
with AbiWord).

But that's not what happened, is it? ;-)

>The Word 97-2000 format, which is the prevalent binary format, is none of
>the above.

Could someone with access to the Word2000 UI confirm that documents are
saved by default in the binary format? I suspect that they still are, and
this HTML mess is a secondary alternative. However, it'd be good to know
for sure.

>I have heard rumors of a Word 97 file format spec available,
>but have never been able to find one. This means that writing an exporter
>for the mess that is the Word 2000 XML wreckage is much easier than
>writing a binary exporter. It should not surprise anyone that all of the
>formats we export are text-based. Especially for a not-so-good programmer
>like me, this is a much more manageable task.
>
>I wish it wasn't like this. If I had a spec for the W97 file format, I
>would work on that instead. But I don't. An I would like to get this
>piece of compatibility to work, at least a little.

Caolan and company have encapsulated a *ton* of knowledge about various Word
binary file formats in libwv, which we're already using for our importer.
The plan all along has been to use that engine to run the process in
reverse.

In many ways, writing the exporter should be easier than the importer, since
we don't have to skip over everything we don't understand. We can just
write out a clean document using the features we do support.

I don't mean to trivialize how large a job this is, but we're already got a
big head start.

>> Did you mean to suggest that this particular Word format should be
>> write-only? If so, why?
>>
>I'm not sure what I wrote to give you that impression, but I meant no such
>thing. It's just that exporters are where we are deficent (with regard to
>word).

Sorry for the confusion. We already import Word binary formats. To get
internal round-trip fidelity, we need an exporter for at least one of those
binary formats.

You proposed an exporter to an unusual new text-only format, which doesn't
help with the binary round-trip problem. Or were you also thinking of
starting a project to import this mess, too (to solve that round-trip
problem)?

Having a binary importer and a text exporter doesn't help much. In fact,
what we've done is taken documents from people using other versions of Word
and forced them to upgrade to Word 2000. That's the absolute *last* thing
we want to do.

>Hope this clears things up.

Me too. Thanks.

Paul



This archive was generated by hypermail 2b25 : Thu Mar 02 2000 - 14:39:11 CST