Re: RTF idea

From: Dom Lachowicz (doml@appligent.com)
Date: Sat Sep 21 2002 - 13:17:59 EDT

  • Next message: Jordi Mas: "Re: Implementing support for barbarisms correction"

    Hi Andrew,

    That was a great email. Unfortunately, I think that you're
    misinterpreting the point here by a little bit.

    AbiWord can read RTF as produced by OpenOffice, WordPad, MSWord,
    WordPerfect just fine. We cannot import "RTF" as produced by Cocoa at
    all because their RTF is invalid (similar problem as "xml" forgetting to
    close off tags, Cocoa closes off braces in the wrong spots and sometimes
    not at all). But even this problem is not insurmountable - if we really
    want it fixed, it will probably only take a day or two on some dedicated
    person's part in order to fix our RTF importer to handle Cocoa/TextEdit
    documents.

    > Almost all word processors can import and export it.
    > This is sometimes sufficient reason for us not to
    > support the native file format of a program.
    > GUI elements support it. The textarea or similar
    > control on Windows uses RTF. I'm pretty sure GTK has
    > something similar and QT and Mac might.

    QT and GTK don't support RTF. KOffice didn't support RTF "well" except
    in their 1.2 release which came out a week ago.

    RTF is a good interchange format. However, it is not a good reason to
    abandon support for native file formats, and I have *never* advocated
    that. Programs always support their native format better than any other.
    In the case of Word, their native format is insanely hard to write, and
    their RTF support is *fabulous*. RTF could also be considered Word's
    native format for several reasons:

    1) It actually was
    2) MSFT writes and owns the spec, and their product is (on average) a
    very good reference implementation of said spec
    3) Its filter plugins convert FOO->RTF for MSWord on import, and
    RTF->FOO on export. This is further evidenced by people suggesting that
    we use Word's filter plugins (with .cnv extensions) if found inside of
    AbiWord via a wrapper plugin.

    IMHO, interchanging RTF with Word an exceptionally good workaround for
    the moment until someone steps up to code a DOC exporter. For other
    products that aren't Microsoft Word, RTF might or might not be a
    suboptimal exchange format.
     
    > RTF is the standard format for formatted text on the
    > clipboards of various OSs, window managers, etc.
    > Without good RTF clipboard support, the clipboard is
    > only useful for plain text.

    No, it really isn't. It is the standard on Windows and Windows alone.
    MacOSX has no real standard, excepting possibly PDF. Unix, BeOs, and QNX
    have no defacto standard, minus text. So does "various OSs" in this
    context mean "Windows 95 through XP" just like Microsoft claims to write
    "cross-platform" applications based on the same assumption? Just kidding
    :)

    There are other structured contents that can be and *are* placed on the
    clipboard such as HTML. While this is technically not as "truthful" as
    RTF for a Word Processor model, it is, in my opinion, acceptable and in
    many cases preferable. We cannot force other programs such as Mozilla
    and Gnumeric to post RTF on the clipboard. We cannot change the world,
    we can only hope to thrive within the constructs set up by other
    programs, which means supporting RTF and a variety of other formats
    well.

    RTF makes sense for WP->WP exchange. In the cases where we're going from
    FOO->WP or WP->FOO, RTF is *worse* than plaintext because it will either
    simply be ignored by the other application or will not be present at
    all. This holds true for any non-windows platform, and probably a good
    number of the windows programs as well. In many cases, RTF is a
    sub-optimal or non-existant exchange medium, and something like HTML
    makes much more sense. We'll have to learn to deal with this.

    > But there is a major problem - Nobody supports the
    > same RTF.

    Most applications do reasonably support reading and writing some version
    of the RTF SPEC. Some have quirks that can be worked around, some quirks
    cannot without breaking "support" for another application. The RTF spec
    has been fairly static - documents produced by a Version 7 compliant
    applications like ourself have a good chance of being imported into
    older apps, plus or minus some quirks. This is similar to some HTML not
    looking good in Netscape 4.7. The spec is forward compatible, so that
    documents produced by, say, version 2 apps can be loaded fine in Version
    7 apps (OO->AbiWord).

    > RTF is a "standard" but nobody implements the
    > standard.

    This true of most standards. However, most programs really do reasonably
    support some version of the RTF spec, plus or minus some quirks.
     
    > MS WordPad is installed on every Windows machine by
    > default and has been for a very long time. It reads
    > MS Word .doc files and reads and writes RTF. For
    > every Windows machine without MS Word installed, this
    > is the standard .doc and RTF software.
    > But WordPad's RTF is different to both the standard
    > and to Word's RTF.
    > AbiWord has signifigant problems, especially with
    > lists, both importing and exporting MS WordPad RTF.

    WordPad reads and writes DOC and RTF *poorly*. WordPad is not a word
    processor. Neither is OSX's TextEdit, which reads and writes RTF.
    Looking at WordPad, do you think that one line of code has changed since
    the version that ships with Win95? The appearance and functionality
    seems to be the same. Does Wordpad seem to be a priority for Microsoft?

    In the real world, a lot of machines have Word installed. For Windows
    machines without Word installed, we cannot and will not assume that
    WordPad is the standard we measure ourselves against - it's ludicrous to
    think so. No one does much or serious work with this product, and if
    these people have RTF and DOC documents, they most likely came from
    another computer or a third party who does own Word. For cases where
    Word or some similar product isn't installed, *we* should be considered
    the standard and aim to inter-operate with the Windows clipboard and the
    various third-party documents flawlessly.

    > Mac OS X has also embraced RTF. The standard Mac text
    > editor has been upgraded from the OS 9 days and now
    > supports full Unicode and its default save format is
    > RTF.
    > But the OS X text editor creates drastically different
    > RTF from all of the standard, MS Word, and MS WordPad.
    > For rvery Mac with OS X installed, this is the
    > standard
    > editor. The OS X version of RTF has been called
    > broken
    > and I bet it is. Yet MS Word and MS WordPad has no
    > problems with any OS X RTF file I can find.
    > AbiWord cannot load OS X RTF at all.

    The OSX version is broken, and I wouldn't call what they're doing to RTF
    as "embracing." This is something that we will honestly try very hard to
    support reading. While OSX RTF represents a minuscule portion of the RTF
    documents as whole, we should still try our best to support reading it.
    If you have actually used their TextEdit (which I have since I own
    Jaguar) you'll see a few things:

    1) It doesn't represent Word RTF very well
    2) It doesn't represent our RTF very well
    3) It doesn't represent OO RTF very well
    4) Doesn't have a way to change any paragraph prop besides alignment
    5) Has minimal text formatting constructs
    6) Won't do lists
    7) Won't do tables
    8) Can't change page or section properties
    9) Is perhaps the most butt-ugly app in existence
    10) Is painful to work with
    11) Doesn't follow a large majority of Apple's UI guidelines (i guess
    this goes with 9 and 10)

    But it will valiantly ignore our lists, tables, page and section
    properties. With TextEdit, this is the best we can hope for. The product
    looks and behaves like an afterthought implemented in a weekend. If we
    got a native Cocoa port of Abi working, I think we could actually
    present it to Apple as a replacement for TextEdit and not be laughed out
    of their headquarters.

    > OpenOffice also supports a broken version of RTF.
    > At least in this case, being open source, there is
    > reasonable hope that it can be fixed if people want it
    > fixed.

    I have the utmost faith in Caolan, so much so that I'm going to call
    this a non-issue.

    > We cannot currently load OpenOffice RTF.
    > OpenOffice has recently decided to lower the priority
    > of RTF support in favour of better Word .doc support.
    > This is not going to help document interchange between
    > open source word processors, and it's definitely going
    > to make clipboard support weaker.

    This is a fallacy. We can load OpenOffice RTF. I just created 3 fairly
    complex RTF documents using OO. I also converted 5 MSWord documents into
    RTF using OO. We loaded the RTF fine in *every single case*
     
    > I don't know the status of RTF with other open source
    > editors such as KWord.

    Used to suck, but has gotten a _whole lot_ better with the recent 1.2
    release. Not as good as ours, but shows a heap of promise.

    > If AbiWord has such problems with all of these very
    > common flavours of RTF, what hope do we have importing
    > the many various flavours output by minor and exotic
    > word processors such as those developed for specific
    > languages.

    Probably a better chance than you think. I'd like to see some hard data
    here rather than mere speculation.

    > What the open source community needs is a standard
    > RTF library much like exists for various image file
    > formats, and even more like our own Wv.

    This is probably true. This will do little to help our integration with
    the proprietary applications that exist (including OO/SO), which make up
    99% of the documents floating around the web today.

    I'd love to inter-operate with every application out there. But, as in
    life, all products are not created equal. Realistically, we can do well
    to mold our importer to handle the various types of RTF that are output
    by *any* product in the market today.

    However, the export problem isn't a compromise between "adhering to the
    standard and going with what MS does." In reality, we need to support
    writing something that the major players out there handle robustly. In
    my opinion, the major players are:

    Word, WordPerfect, Lotus, OpenOffice, Abi, KWord, Gobe

    TextEdit, Ted, and WordPad aren't.

    This also assumes that making a change to stick with MS's handling of
    things will break support for another app, which it often doesn't. It
    also assumes that the SPEC is complete and without error, which it is
    not.

    As an aside, I work with PDF files all day long, including writing
    things to robustly parse through the files and read them. I also have
    written several PDF producers. There are 4 major PDF specs for versions
    1-4. There are a lot of PDF producers out there. Most producers output
    crappy and sometimes invalid PDF. There are only a few de-facto readers
    out there. I've read all of the specs. The specs are incomplete,
    misinformed, and sometimes lie. The validity of your PDF isn't measured
    against the SPEC, it's measured against how well Acrobat and Reader
    handle your stuff. Documentation writers, not program writers, write
    these sorts of specs, and no-one catches all of the bugs in a 400 page
    long document. They're full of shortcomings and flaws by their very
    nature, as are programs such as Word. The harsh reality is that a
    book/spec doesn't read through our RTF file, Word does.

    Users care little about specs, and so long as there's really 1 defacto
    producer/consumer of RTF on the market AND the spec has shortcomings,
    the defacto product in reality becomes the spec. I'm suggesting that
    most of the products on my "A-List" actually measure themselves against
    how well they behave when playing with MSWord instead of how well they
    play with the SPEC. Or if they don't, well, they had probably ought to.

    FWIW, any code of mine in the RTF importer and exporter is up for grabs
    for such a (L)GPL librtf library, including the bits in wv to map
    languages, lids, locales, ... to iconv descriptors and ISO language
    tags.

    Inter-operability is a great and lofty goal; a goal that we should
    strive for without question. How we go about doing so, however, should
    come under scrutiny and should be done for the right reasons and should
    be done using the correct methods. We clearly can't inter-operate
    perfectly with every app on the planet. We should do our best to
    integrate well with the products we care most about. We should further
    do our best not to fsck up too badly with the ones we care less about.
    Prioritization is key.

    Cheers,
    Dom



    This archive was generated by hypermail 2.1.4 : Sat Sep 21 2002 - 13:21:39 EDT