state of bidi

From: Tomas Frydrych (tomas@frydrych.uklinux.net)
Date: Tue Apr 29 2003 - 15:29:38 EDT

  • Next message: F J Franklin: "commit: wv: DocBook converter"

    This is to bring everyone up to speed on the bidi functionality

    A. STABLE
    Bidi in stable is badly broken (has been since 1.0.3; my fault
    entirely), and I have no intention of fixing it; the bidi code is too
    different from that used in HEAD and I simply do not have the time
    for this (in addition, the bidi functionality was never working well
    enough in STABLE for it to be broadly usable). Consequently, I will
    change status of all bidi bugs filed against stable to 'wont fix'
    asking the reporters to use HEAD instead. A note to this effect
    should be added to the download page, which should contain links to
    the precompiled win32 daily snapshots of HEAD.

    B. HEAD
    1. General Progress
    A good progress has been made on the development code and the major
    bugs associated with the new (more efficient) code introduced after
    1.0.x have now been fixed (I think and hope); essentially, 2.0 will
    be a usable bidi wordprocessor.

    2. Arabic Support
    The basic Arabic shaping engine is, thanks to help from a Lebanese
    user, working reasonably well (basic shaping, two character
    ligatures, combining diacritics). This should make AbiWord usable for
    basic Arabic wordprocessing (and hopefully draw more feedback, and
    perhaps developers in the future). Apart from essential bug fixing, I
    do not intend to develop the shaping capabilities any further -- this
    is really a job for a third party library after the 2.0 release.

    3. Import/Export
    a. Text Imp/Exp
    All our importers can now handle Unicode based explict overrides
    (although some fixes are still on my list); our text exporters do so
    as well. Correct export/import of the paragraph direction is not
    handled yet (next on my list).

    b. MS Word
    MS Word importer is getting better, but there are still issues,
    particularly with importing numbers, which need to be fixed before
    the 2.0 release (next on my list).

    c. RTF
    I am not sure about the state of the RTF imp/exporter, and this needs
    to be working well for the 2.0 release.

    C. Unicode Compliance
    1. The intention is to follow the the Unicode bidi algorithm as close
    as possible. With the recently added support for the LRO/RLO/LRM/RLM
    characters we are nearly there, being somewhere between what the
    Unicode specs call 'implicit bidirecitonality' and 'full
    bidirectionality', from the latter we are separated by (2) and (3)
    below.

    2. We do not support LRE/RLE characters, and I do not currently
    expect this to be in place for 2.0, as this is nowhere near as
    important as the other outstanding issues, and depends on (3) below.

    3. The Unicode algorithm prescribes that the embeding levels have to
    be resolved on an entire paragraph, prior to line breaking. The lines
    are then broken and indiviual lines are reordered according to the
    embeding levels. At present we do line breaking first and then
    reorder individual lines. This approach is more efficient, since when
    text is modified we only need to process the affected lines, not the
    entire paragraph. The only situation in which this sequence would
    produce different results, AFAIK, is when the text contains
    LRO/LRE/RLO/RLE characters. Since we do not use LRO/RLO internally,
    this will only become an issue when implementing the LRE/RLE support.
    I think the Unicode processing order could be implement with only a
    small performance loss, but will not know until I try it. It would,
    however, require a number of changes to block/line/run classes that
    are likely to introduce a new set of bugs -- I do not want to do this
    prior to 2.0, to avoid the situation we have with the present STABLE
    (as far as bidi is concerned, that is).

    D. Pre-2.0 work
    Essentially, the bulk of what urgently needs to be done prior to 2.0
    is fixing up the imp/exp code, so that at the least text imp/exp, MS
    Word imp and RTF imp/exp are rock solid. For this I need help from
    bidi users in testing and bug-filling, so please do not be shy in
    filling new bugs (please add 'bidi' to the keywords, and make the
    sample docs as short as possible).

    Tomas



    This archive was generated by hypermail 2.1.4 : Tue Apr 29 2003 - 15:42:52 EDT