Re: commit (HEAD): toward AW native clipboard

From: tomasfrydrych@yahoo.co.uk
Date: Mon Jun 30 2003 - 03:31:28 EDT

  • Next message: Vital Khilko: "Belarusian AbiWord translation"

    Hi Andrew,

    > I still don't understand why we "have to insert
    > explicit characters". I understand that Word does
    > things differently and that we *have to* work with it.
    > But I don't understand why inserting these characters
    > "has to" be the solution. It seems to have a larger
    > and larger flow-on effect for us yet for MS Word it
    > can save and load RTF files just fine without the use
    > of these characters. Why can't we solve the problem
    > more in the way that MS does?
    >
    > Or are you saying that MS Word RTF files are special
    > in this regard or that MS Word .doc files work fine
    > but their RTF doesn't even work for themselves?

    The problem is that the bidi-rectional algorithm used by Word (and
    assumed in both RTF and DOC formats) is different than the Unicode
    bidirecitonal algorithm which we use. When we load text from RTF (or
    DOC) and do Unicode bidirectional layout on it, we get a different
    layout than when the same doc is view using Word.

    The cause of this is MS's handling of neutral (punctuation, white
    space) and weak (numbers) characters, which are treated as
    directionaly strong, their directionality derived from the keyboard layout
    used to input them. For example, '0' input with a Hebrew keyboard will
    be considered strong RTL, and '0' input with an English keyboard will
    be considered strong LTR. In the RTF format this is reflected by the
    former being flagged as \rtlchar and latter as \ltrchar (an equivalent
    thing happens in the DOC binary format).

    In order to recreate the original layout using the Unicode algorithm, we
    have to mark these characters with an override on import. Similarly on
    export to RTF, we need to use our layout information to decide
    whether a given neutral/weak character should be marked as LTR or
    RTL (this really sucks). The only other alternative would be to use
    different bidi algorithms for different types of documents, and AFA I am
    concerned, that is a no-no (creates usability problems, quite appart
    from someone having to implment the alternative algorithms).

    Tomas



    This archive was generated by hypermail 2.1.4 : Mon Jun 30 2003 - 03:43:24 EDT