Re: RFC: bug #2742

From: Tomas Frydrych (tomas@frydrych.uklinux.net)
Date: Mon Mar 18 2002 - 17:44:51 EST

  • Next message: Tomas Frydrych: "Re: RFC: bug #2742"

    Hi Martin,

    > It should not be done in the formatter. We could do a scan at the
    > piecetable level just after the document has loaded. Do these changes
    > without throwing change records so we don't get into into issues with
    > undo.
    >
    > The Piecetable already has the document properties and you can keep
    > track of the state of left to righted-ness by examining the properties
    > of the frags as you step through the document.
     
    There reason why I want to do this from the formatter (within
    fl_BlockLayout) is that determining the context is not that simple,
    because not all characters have strong directional properties;
    FriBiDi uses about a dozen of initial 'direction types' that have to be
    resolved by the BIDI algorithm to either RTL or LTR. This requires
    several passes over the data, and it is the job of the layout engine
    to do that. The piecetable stores the logical shape of the document
    (i.e., the sequence of characters in the order in which the user
    inputs them), while the layout engine creates the visual sequence.
    The problem here lies in the fact that in the Word format the
    piecetable is not purely logical representation of the document, it
    contains some stuff that is the product of some layout engine.

    To avoid undo problems, we could let this one routine in
    fl_BlockLayout to access the PT data directly -- this is always 1:1
    transformation, so we could just overwrite the character in memory,
    rather than delete it from the piecetable and then insert the
    replacement.

    > Justa caution though, suppose an auther in Hewbrew wanted to place a
    > ")(" in his document, would your algorithim detect this and not make
    > the change?

    That's really not a problem. The algorithm is very simple, it just
    replaces all mirror characters in RTL context with their mirror
    images. It does so based on the knowledge that a given file format,
    in this case Word doc, stores these visually rather than
    semantically.

    > Also is it possible to quickly detect if a document has any RTL durng
    > import so we don't have to scan ordinary docs?
    Not unless the document format explicitely stores that kind of info
    somewhere.

    Tomas



    This archive was generated by hypermail 2.1.4 : Mon Mar 18 2002 - 17:48:03 EST