Re: XHTML


Subject: Re: XHTML
From: Paul Rohr (paul@abisource.com)
Date: Wed Jan 26 2000 - 17:05:47 CST


Sam,

It looks like you're asking two different questions here:

1. XHTML (easy)
-----------------
Does anyone object to having an awesomely thorough XHTML importer and
exporter? I can't see why anyone would. XHTML looks like a clean new
format that people may eventually start supporting elsewhere.

2. old-style HTML (???)
------------------------
What do we do about old-style HTML? This is the more contentious question.
There are massive quantities of this content out there in the world, and
writing tolerant parsers which "properly" handle it is very, very, very,
very, very hard. Did I mention that it's hard? :-)

<rant>
In fact, I'd be willing to go out on a limb and assert that reliably
importing all the HTML misfeatures out there in the world may be *harder*
than reliably importing Word documents. At least the Word family of file
formats is deterministic. There are only a handful of discrete binaries
producing content in that format, so eventually all of its quirks can be
reverse-engineered. HTML may look simpler, but it's a mess.
</rant>

This is why we don't have, and may never have, an HTML importer.

However, having an HTML *exporter*, even a dumb one, is very useful. Since
that format is so prolific, being able to export *our* content in that
format is important. More specifically, being able to export our content in
*some* form that those browsers can read is what's important.

bottom line
-----------
I think the reason you've been getting so much static here is that you've
been proposing to morph the existing HTML exporter into an XHTML exporter.

Since our impexp framework is so modular, why not have both for now? If
you're right about the idea that old-style HTML is no longer relevant,
because it's been supplanted by XHTML, then we can just drop that code, and
(perhaps) drop the X from the name of the new code.

Paul



This archive was generated by hypermail 2b25 : Wed Jan 26 2000 - 17:00:28 CST