Re: commit (HEAD): toward AW native clipboard

From: tomasfrydrych@yahoo.co.uk
Date: Mon Jun 30 2003 - 03:31:28 EDT

Next message: Vital Khilko: "Belarusian AbiWord translation"

Previous message: Andrew Dunbar: "Re: commit (HEAD): toward AW native clipboard"
In reply to: Andrew Dunbar: "Re: commit (HEAD): toward AW native clipboard"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

Hi Andrew,

> I still don't understand why we "have to insert
> explicit characters". I understand that Word does
> things differently and that we *have to* work with it.
> But I don't understand why inserting these characters
> "has to" be the solution. It seems to have a larger
> and larger flow-on effect for us yet for MS Word it
> can save and load RTF files just fine without the use
> of these characters. Why can't we solve the problem
> more in the way that MS does?
>
> Or are you saying that MS Word RTF files are special
> in this regard or that MS Word .doc files work fine
> but their RTF doesn't even work for themselves?

The problem is that the bidi-rectional algorithm used by Word (and
assumed in both RTF and DOC formats) is different than the Unicode
bidirecitonal algorithm which we use. When we load text from RTF (or
DOC) and do Unicode bidirectional layout on it, we get a different
layout than when the same doc is view using Word.

The cause of this is MS's handling of neutral (punctuation, white
space) and weak (numbers) characters, which are treated as
directionaly strong, their directionality derived from the keyboard layout
used to input them. For example, '0' input with a Hebrew keyboard will
be considered strong RTL, and '0' input with an English keyboard will
be considered strong LTR. In the RTF format this is reflected by the
former being flagged as \rtlchar and latter as \ltrchar (an equivalent
thing happens in the DOC binary format).

In order to recreate the original layout using the Unicode algorithm, we
have to mark these characters with an override on import. Similarly on
export to RTF, we need to use our layout information to decide
whether a given neutral/weak character should be marked as LTR or
RTL (this really sucks). The only other alternative would be to use
different bidi algorithms for different types of documents, and AFA I am
concerned, that is a no-no (creates usability problems, quite appart
from someone having to implment the alternative algorithms).

Tomas

Next message: Vital Khilko: "Belarusian AbiWord translation"
Previous message: Andrew Dunbar: "Re: commit (HEAD): toward AW native clipboard"
In reply to: Andrew Dunbar: "Re: commit (HEAD): toward AW native clipboard"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

This archive was generated by hypermail 2.1.4 : Mon Jun 30 2003 - 03:43:24 EDT