From: Dom Lachowicz (doml@appligent.com)
Date: Sat Sep 21 2002 - 13:17:59 EDT
Hi Andrew,
That was a great email. Unfortunately, I think that you're
misinterpreting the point here by a little bit.
AbiWord can read RTF as produced by OpenOffice, WordPad, MSWord,
WordPerfect just fine. We cannot import "RTF" as produced by Cocoa at
all because their RTF is invalid (similar problem as "xml" forgetting to
close off tags, Cocoa closes off braces in the wrong spots and sometimes
not at all). But even this problem is not insurmountable - if we really
want it fixed, it will probably only take a day or two on some dedicated
person's part in order to fix our RTF importer to handle Cocoa/TextEdit
documents.
> Almost all word processors can import and export it.
> This is sometimes sufficient reason for us not to
> support the native file format of a program.
> GUI elements support it. The textarea or similar
> control on Windows uses RTF. I'm pretty sure GTK has
> something similar and QT and Mac might.
QT and GTK don't support RTF. KOffice didn't support RTF "well" except
in their 1.2 release which came out a week ago.
RTF is a good interchange format. However, it is not a good reason to
abandon support for native file formats, and I have *never* advocated
that. Programs always support their native format better than any other.
In the case of Word, their native format is insanely hard to write, and
their RTF support is *fabulous*. RTF could also be considered Word's
native format for several reasons:
1) It actually was
2) MSFT writes and owns the spec, and their product is (on average) a
very good reference implementation of said spec
3) Its filter plugins convert FOO->RTF for MSWord on import, and
RTF->FOO on export. This is further evidenced by people suggesting that
we use Word's filter plugins (with .cnv extensions) if found inside of
AbiWord via a wrapper plugin.
IMHO, interchanging RTF with Word an exceptionally good workaround for
the moment until someone steps up to code a DOC exporter. For other
products that aren't Microsoft Word, RTF might or might not be a
suboptimal exchange format.
> RTF is the standard format for formatted text on the
> clipboards of various OSs, window managers, etc.
> Without good RTF clipboard support, the clipboard is
> only useful for plain text.
No, it really isn't. It is the standard on Windows and Windows alone.
MacOSX has no real standard, excepting possibly PDF. Unix, BeOs, and QNX
have no defacto standard, minus text. So does "various OSs" in this
context mean "Windows 95 through XP" just like Microsoft claims to write
"cross-platform" applications based on the same assumption? Just kidding
:)
There are other structured contents that can be and *are* placed on the
clipboard such as HTML. While this is technically not as "truthful" as
RTF for a Word Processor model, it is, in my opinion, acceptable and in
many cases preferable. We cannot force other programs such as Mozilla
and Gnumeric to post RTF on the clipboard. We cannot change the world,
we can only hope to thrive within the constructs set up by other
programs, which means supporting RTF and a variety of other formats
well.
RTF makes sense for WP->WP exchange. In the cases where we're going from
FOO->WP or WP->FOO, RTF is *worse* than plaintext because it will either
simply be ignored by the other application or will not be present at
all. This holds true for any non-windows platform, and probably a good
number of the windows programs as well. In many cases, RTF is a
sub-optimal or non-existant exchange medium, and something like HTML
makes much more sense. We'll have to learn to deal with this.
> But there is a major problem - Nobody supports the
> same RTF.
Most applications do reasonably support reading and writing some version
of the RTF SPEC. Some have quirks that can be worked around, some quirks
cannot without breaking "support" for another application. The RTF spec
has been fairly static - documents produced by a Version 7 compliant
applications like ourself have a good chance of being imported into
older apps, plus or minus some quirks. This is similar to some HTML not
looking good in Netscape 4.7. The spec is forward compatible, so that
documents produced by, say, version 2 apps can be loaded fine in Version
7 apps (OO->AbiWord).
> RTF is a "standard" but nobody implements the
> standard.
This true of most standards. However, most programs really do reasonably
support some version of the RTF spec, plus or minus some quirks.
> MS WordPad is installed on every Windows machine by
> default and has been for a very long time. It reads
> MS Word .doc files and reads and writes RTF. For
> every Windows machine without MS Word installed, this
> is the standard .doc and RTF software.
> But WordPad's RTF is different to both the standard
> and to Word's RTF.
> AbiWord has signifigant problems, especially with
> lists, both importing and exporting MS WordPad RTF.
WordPad reads and writes DOC and RTF *poorly*. WordPad is not a word
processor. Neither is OSX's TextEdit, which reads and writes RTF.
Looking at WordPad, do you think that one line of code has changed since
the version that ships with Win95? The appearance and functionality
seems to be the same. Does Wordpad seem to be a priority for Microsoft?
In the real world, a lot of machines have Word installed. For Windows
machines without Word installed, we cannot and will not assume that
WordPad is the standard we measure ourselves against - it's ludicrous to
think so. No one does much or serious work with this product, and if
these people have RTF and DOC documents, they most likely came from
another computer or a third party who does own Word. For cases where
Word or some similar product isn't installed, *we* should be considered
the standard and aim to inter-operate with the Windows clipboard and the
various third-party documents flawlessly.
> Mac OS X has also embraced RTF. The standard Mac text
> editor has been upgraded from the OS 9 days and now
> supports full Unicode and its default save format is
> RTF.
> But the OS X text editor creates drastically different
> RTF from all of the standard, MS Word, and MS WordPad.
> For rvery Mac with OS X installed, this is the
> standard
> editor. The OS X version of RTF has been called
> broken
> and I bet it is. Yet MS Word and MS WordPad has no
> problems with any OS X RTF file I can find.
> AbiWord cannot load OS X RTF at all.
The OSX version is broken, and I wouldn't call what they're doing to RTF
as "embracing." This is something that we will honestly try very hard to
support reading. While OSX RTF represents a minuscule portion of the RTF
documents as whole, we should still try our best to support reading it.
If you have actually used their TextEdit (which I have since I own
Jaguar) you'll see a few things:
1) It doesn't represent Word RTF very well
2) It doesn't represent our RTF very well
3) It doesn't represent OO RTF very well
4) Doesn't have a way to change any paragraph prop besides alignment
5) Has minimal text formatting constructs
6) Won't do lists
7) Won't do tables
8) Can't change page or section properties
9) Is perhaps the most butt-ugly app in existence
10) Is painful to work with
11) Doesn't follow a large majority of Apple's UI guidelines (i guess
this goes with 9 and 10)
But it will valiantly ignore our lists, tables, page and section
properties. With TextEdit, this is the best we can hope for. The product
looks and behaves like an afterthought implemented in a weekend. If we
got a native Cocoa port of Abi working, I think we could actually
present it to Apple as a replacement for TextEdit and not be laughed out
of their headquarters.
> OpenOffice also supports a broken version of RTF.
> At least in this case, being open source, there is
> reasonable hope that it can be fixed if people want it
> fixed.
I have the utmost faith in Caolan, so much so that I'm going to call
this a non-issue.
> We cannot currently load OpenOffice RTF.
> OpenOffice has recently decided to lower the priority
> of RTF support in favour of better Word .doc support.
> This is not going to help document interchange between
> open source word processors, and it's definitely going
> to make clipboard support weaker.
This is a fallacy. We can load OpenOffice RTF. I just created 3 fairly
complex RTF documents using OO. I also converted 5 MSWord documents into
RTF using OO. We loaded the RTF fine in *every single case*
> I don't know the status of RTF with other open source
> editors such as KWord.
Used to suck, but has gotten a _whole lot_ better with the recent 1.2
release. Not as good as ours, but shows a heap of promise.
> If AbiWord has such problems with all of these very
> common flavours of RTF, what hope do we have importing
> the many various flavours output by minor and exotic
> word processors such as those developed for specific
> languages.
Probably a better chance than you think. I'd like to see some hard data
here rather than mere speculation.
> What the open source community needs is a standard
> RTF library much like exists for various image file
> formats, and even more like our own Wv.
This is probably true. This will do little to help our integration with
the proprietary applications that exist (including OO/SO), which make up
99% of the documents floating around the web today.
I'd love to inter-operate with every application out there. But, as in
life, all products are not created equal. Realistically, we can do well
to mold our importer to handle the various types of RTF that are output
by *any* product in the market today.
However, the export problem isn't a compromise between "adhering to the
standard and going with what MS does." In reality, we need to support
writing something that the major players out there handle robustly. In
my opinion, the major players are:
Word, WordPerfect, Lotus, OpenOffice, Abi, KWord, Gobe
TextEdit, Ted, and WordPad aren't.
This also assumes that making a change to stick with MS's handling of
things will break support for another app, which it often doesn't. It
also assumes that the SPEC is complete and without error, which it is
not.
As an aside, I work with PDF files all day long, including writing
things to robustly parse through the files and read them. I also have
written several PDF producers. There are 4 major PDF specs for versions
1-4. There are a lot of PDF producers out there. Most producers output
crappy and sometimes invalid PDF. There are only a few de-facto readers
out there. I've read all of the specs. The specs are incomplete,
misinformed, and sometimes lie. The validity of your PDF isn't measured
against the SPEC, it's measured against how well Acrobat and Reader
handle your stuff. Documentation writers, not program writers, write
these sorts of specs, and no-one catches all of the bugs in a 400 page
long document. They're full of shortcomings and flaws by their very
nature, as are programs such as Word. The harsh reality is that a
book/spec doesn't read through our RTF file, Word does.
Users care little about specs, and so long as there's really 1 defacto
producer/consumer of RTF on the market AND the spec has shortcomings,
the defacto product in reality becomes the spec. I'm suggesting that
most of the products on my "A-List" actually measure themselves against
how well they behave when playing with MSWord instead of how well they
play with the SPEC. Or if they don't, well, they had probably ought to.
FWIW, any code of mine in the RTF importer and exporter is up for grabs
for such a (L)GPL librtf library, including the bits in wv to map
languages, lids, locales, ... to iconv descriptors and ISO language
tags.
Inter-operability is a great and lofty goal; a goal that we should
strive for without question. How we go about doing so, however, should
come under scrutiny and should be done for the right reasons and should
be done using the correct methods. We clearly can't inter-operate
perfectly with every app on the planet. We should do our best to
integrate well with the products we care most about. We should further
do our best not to fsck up too badly with the ones we care less about.
Prioritization is key.
Cheers,
Dom
This archive was generated by hypermail 2.1.4 : Sat Sep 21 2002 - 13:21:39 EDT