Re: XHTML


Subject: Re: XHTML
From: sam th (sam@bur-jud-118-039.rh.uchicago.edu)
Date: Wed Jan 26 2000 - 17:11:16 CST


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wed, 26 Jan 2000, Paul Rohr wrote:

> Sam,
>
> It looks like you're asking two different questions here:
>
> 1. XHTML (easy)
> -----------------
> Does anyone object to having an awesomely thorough XHTML importer and
> exporter? I can't see why anyone would. XHTML looks like a clean new
> format that people may eventually start supporting elsewhere.
This sounds good, and is something that I agree Abiword should do.

>
> 2. old-style HTML (???)
> ------------------------
> What do we do about old-style HTML? This is the more contentious question.
> There are massive quantities of this content out there in the world, and
> writing tolerant parsers which "properly" handle it is very, very, very,
> very, very hard. Did I mention that it's hard? :-)
>
> <rant>
> In fact, I'd be willing to go out on a limb and assert that reliably
> importing all the HTML misfeatures out there in the world may be *harder*
> than reliably importing Word documents. At least the Word family of file
> formats is deterministic. There are only a handful of discrete binaries
> producing content in that format, so eventually all of its quirks can be
> reverse-engineered. HTML may look simpler, but it's a mess.
> </rant>
>
> This is why we don't have, and may never have, an HTML importer.
This also seems true, I was just reading an artice on that mess that is
current HTML out there in the real world.

>
> However, having an HTML *exporter*, even a dumb one, is very useful. Since
> that format is so prolific, being able to export *our* content in that
> format is important. More specifically, being able to export our content in
> *some* form that those browsers can read is what's important.
>
> bottom line
> -----------
> I think the reason you've been getting so much static here is that you've
> been proposing to morph the existing HTML exporter into an XHTML exporter.
>
> Since our impexp framework is so modular, why not have both for now? If
> you're right about the idea that old-style HTML is no longer relevant,
> because it's been supplanted by XHTML, then we can just drop that code, and
> (perhaps) drop the X from the name of the new code.
>

The reason that I don't think that we should have both is that the
difference is so minimal. The changes that I listed in my previous email
are the only ones (as far as I can tell) that we would have to make to
change our HTML exporter into a conforming XHTML exporter. And the only
hard change I listed deals with <br>, which is something we should fix
even if we stick with HTML (it's already marked as a bug). None of the
changes that I suggest prevent existing renderers from properly displaying
AbiWord-generated HTML. It just allows us to be displayed on XHTML
browsers (not that any exist). If, in the future, AbiWord develops the
sort of features where the difference between HTML 4.01 (which is the
standard we should shoot for) and XHMTL is meaningful, then maybe we
should fork the exporter code. But for now, the differences in the realm
of text rendering are mininmal, and the W3C has even provideda guide to
making XHTML code compatible with HTML. (that's where I got my
suggestions).

> Paul
>

           
                                     sam th
                                     sytobinh@uchicago.edu
                                        
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.1 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQE4j38Vt+kM0Mq9M/wRAmjGAJ91pxD9tFbV9oL5Q2c87p+pl+BIDwCgm77L
aGvuWvqYqegxmISwabpAxvs=
=Oc6N
-----END PGP SIGNATURE-----



This archive was generated by hypermail 2b25 : Wed Jan 26 2000 - 17:11:17 CST