Re: RFC: bug #2742

From: Tomas Frydrych (tomas@frydrych.uklinux.net)
Date: Mon Mar 18 2002 - 17:44:51 EST

Next message: Tomas Frydrych: "Re: RFC: bug #2742"

Previous message: Rui Miguel Silva Seabra: "commit: it-IT update"
In reply to: Martin Sevior: "Re: RFC: bug #2742"
Next in thread: Andrew Dunbar: "Re: RFC: bug #2742"
Next in thread: Andrew Dunbar: "Re: RFC: bug #2742"
Reply: Andrew Dunbar: "Re: RFC: bug #2742"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

Hi Martin,

> It should not be done in the formatter. We could do a scan at the
> piecetable level just after the document has loaded. Do these changes
> without throwing change records so we don't get into into issues with
> undo.
>
> The Piecetable already has the document properties and you can keep
> track of the state of left to righted-ness by examining the properties
> of the frags as you step through the document.

There reason why I want to do this from the formatter (within
fl_BlockLayout) is that determining the context is not that simple,
because not all characters have strong directional properties;
FriBiDi uses about a dozen of initial 'direction types' that have to be
resolved by the BIDI algorithm to either RTL or LTR. This requires
several passes over the data, and it is the job of the layout engine
to do that. The piecetable stores the logical shape of the document
(i.e., the sequence of characters in the order in which the user
inputs them), while the layout engine creates the visual sequence.
The problem here lies in the fact that in the Word format the
piecetable is not purely logical representation of the document, it
contains some stuff that is the product of some layout engine.

To avoid undo problems, we could let this one routine in
fl_BlockLayout to access the PT data directly -- this is always 1:1
transformation, so we could just overwrite the character in memory,
rather than delete it from the piecetable and then insert the
replacement.

> Justa caution though, suppose an auther in Hewbrew wanted to place a
> ")(" in his document, would your algorithim detect this and not make
> the change?

That's really not a problem. The algorithm is very simple, it just
replaces all mirror characters in RTL context with their mirror
images. It does so based on the knowledge that a given file format,
in this case Word doc, stores these visually rather than
semantically.

> Also is it possible to quickly detect if a document has any RTL durng
> import so we don't have to scan ordinary docs?
Not unless the document format explicitely stores that kind of info
somewhere.

Tomas

Next message: Tomas Frydrych: "Re: RFC: bug #2742"
Previous message: Rui Miguel Silva Seabra: "commit: it-IT update"
In reply to: Martin Sevior: "Re: RFC: bug #2742"
Next in thread: Andrew Dunbar: "Re: RFC: bug #2742"
Next in thread: Andrew Dunbar: "Re: RFC: bug #2742"
Reply: Andrew Dunbar: "Re: RFC: bug #2742"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

This archive was generated by hypermail 2.1.4 : Mon Mar 18 2002 - 17:48:03 EST