Re: AbiWord DTD


Subject: Re: AbiWord DTD
From: sam th (sam@bur-jud-118-039.rh.uchicago.edu)
Date: Tue Feb 22 2000 - 19:17:41 CST


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Tue, 22 Feb 2000, Paul Rohr wrote:

> Yikes, it's a DTD! Guess that's what I get for taking the holiday weekend
> off, huh?
>
> To be honest, I have *very* mixed emotions about seeing this. DTDs just
> look so darned official, you know? It makes people think they know exactly
> what does and doesn't belong in our file format. But they're wrong.

This may be. And for me, even as the DTD author, it is certainly true.
In fact, just writing the DTD helped me to better understand the file
format. However, I think it's a _good_ thing to have people understand
our file format.

1- Other people should be able to understand. If, say KWord want's to
import .abw documents, then the best thing is to have a spec available.
  
2- Us. Lots of AbiWord developers will want to understand the format, and
the easiest way to do that is with a spec.

>
> As one of the designated file format enforcers, I've gleefully taken
> advantage of the fact that nobody thinks they understand our file format
> well enough to write a DTD. You *have* to look at the source of our product
> to know what the file format currently is.

Unfortunaly, if you could just look at ie_[exp,imp]_AbiWord_1.cpp and
understand the format, then everyting would be good (and the DTD I wrote
would be better). But that isn't the case. For example, it took serious
searching to find the list of fields that I included. If someone wants to
understand the format, we should not force them to learn the entire code
base. Again, think of an outside developer who wants to acces AbiWord
through the _file format_ not through C++. They would have a much harder
time than I finding the information.

>
> IMHO, this lack of documentation is a Good Thing.
>
> A. No documentation is better than wrong documentation.
> --------------------------------------------------------
> This may sound like heresy, but don't get me wrong. Documentation is good.
> I *like* documentation -- provided it's accurate.
>
> However, there have been far too many times in my life that I've gotten
> burned by misleading or outdated documentation, particularly when I didn't
> *know* how flawed it was. I'd much rather do without documentation than get
> confused by something that's inaccurate.

What you say here is certainly correct. However, I am willing to
_personally_ take responsibility for the correctness of this
documentation, and to work to keep it current with the code. Then, we
have *good* documentation, and everyone is happy.
 
>
> B. The file format is currently in flux.
> -----------------------------------------
> It's been quite a while now since there were any changes to the file format,
> but that's about to change, in a big way.
>
> For example, Luke has submitted a patch with file format changes to make
> lists (mostly) work, but he hasn't yet gotten any serious feedback from
> anyone else. (I don't know whether anyone else plans to comment on it, but
> it's definitely at or near the top of my list.)
>
> Likewise, we know that empty field tags are going to be replaced with a
> barrage of new container-style markup. Here again, there's a proposal on
> the table which hasn't gotten any serious feedback yet.
>
> Even worse, it's pretty clear to me that these two proposals will interact
> in ways which probably affect the file format, too.

This is also true. It would be a problem if we were resting on our
laurels. However, these changes are not likely to be that complicated,
from the perspective of the maintainer of a DTD. Writing the current one
took about an hour, a significant portion of which was devoted to finding
the various structures in sample documents. Putting in Luke's patch was a
2 minute job. Maintaining a DTD for XSL is hard. Maintaining one for the
AbiWord file format is not (relatively speaking).
  
>
> C. Which version of the file format are we documenting here?
> -------------------------------------------------------------
> It looks like this DTD probably includes Luke's proposed changes, but omits
> a number of other features of the file format which do exist.

This further serves to prove my first point. Despite extensive looking, I
have missed some things. However, this is my fault, and not a neccessity
of a DTD. Additionally, if you know of anything I am missing, I would
appreciate you letting me know, if only for my personal edification.
 
>
> D. Any DTD which doesn't get used is probably wrong.
> -----------------------------------------------------
> The main use for DTDs is to determine whether an XML/SGML document is
> "valid" or not. However, we do not and will not ever be using DTDs in this
> way for the AbiWord file format. A valid AbiWord document is one that we
> read. Period.

Right. But the importer is likely to be far more forgiving than an XML
validator. Say I hack the exporter in some bad way. Then, the importer
accepts it, but strips out large sections of formatting. What's wrong?
Maybe there's a problem with the XML. Then again, maybe there isn't, but
it's nice to know.

>
> If there's ever a conflict between a DTD and our behavior, then the DTD is
> wrong. That's not a characteristic people usually expect of DTDs. They
> want it to be the other way around, but it's not.
>

This is why I have made clear in _every_ email that I have sent that a
validation error is a problem in the DTD, not in the document.
Eventually, it may not be that way. But for now, I have taken pains to
make that clear.
And as to us always being right, I think this is nieve. We could _easily_
write conflicting importers and exporters, or change one in a way we did
not intend to. Then, the DTD would be right, and the implementation would
be wrong. Currently, we have given ourselves wide latitude to change the
format. But that will not always be the case, and programmer error is
ever-present.

> bottom line
> -----------
> If we never publish a DTD for our file format, I wouldn't shed any tears.
> It's not needed for XML compliance, and I think having one sets false
> expectations among hard-core XML bigots.
>

I'll assume that that either didn't mean anyone here, or was a compliment
:-)

> However, I will grudgingly admit that a DTD would make a nice formal piece
> of read-only documentation which helps describe (in a more formal sense) how
> our file format works. In that vein, if people really really want to have
> one, I won't get in the way, provided that:
>
> - we wait until the file format *is* stable,
Not Check - but should we wait? We have yet to wait to advocate the people
use it, why not tell them what they are using?

> - we have someone committed to maintaining its accuracy, AND
Check

> - we clearly state that the DTD is descriptive, *not* normative
Check

>
> We are not at that point now, so my assertion is that publishing the DTD now
> does far more more harm than good.
Repectfully, and mindful of that fact that you are far better versed on
the file format than I.
>
> Paul,
> designated curmudgeon
>
>

           
                                     sam th (apparrent XML weenie :)
                                     sytobinh@uchicago.edu
                                        
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.1 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQE4szU3t+kM0Mq9M/wRAh6mAJ4+wI/AhhL4aEdGlCKTybWZTmfFdACgsutD
6DgaQ9uOCY5WpNcSfttOyCw=
=6qo3
-----END PGP SIGNATURE-----



This archive was generated by hypermail 2b25 : Tue Feb 22 2000 - 19:15:05 CST