Re: Some thoughts about BiDi Quirks Handling

From: Andrew Dunbar (hippietrail@yahoo.com)
Date: Sun Sep 22 2002 - 00:44:38 EDT

  • Next message: Andrew Dunbar: "Re: Implementing support for barbarisms correction"

     --- Dom Lachowicz <doml@appligent.com> wrote:
    > On Sat, 2002-09-21 at 08:21, Martin Sevior wrote:
    > >
    > >
    > > On Sat, 21 Sep 2002, Omer Zak wrote:
    > >
    > > > Due to Microsoft's actions, there are more than
    > > > one version of BiDi algorithm in use. The
    > > > Hebrew versions of different MS-Word versions
    > > > implement slightly different BiDi algorithms. I
    > > > don't know what is the situation in Arabic and
    > > > Persian.
    > > >
    > > > In order for AbiWord and other Free Software
    > > > word processors to fully replace Microsoft
    > > > wordprocessors, it must be possible for them to
    > > > faithfully import files produced by Microsoft
    > > > wordprocessors, in *.doc and/or *.rtf formats.
    > > >
    > > > But in addition to support for those file
    > > > formats, the Free Software wordprocessors need
    > > > to implement compatible BiDi algorithms, so that
    > > > the imported documents will look the same in
    both
    > > > Microsoft wordprocessors and in Free
    > > > wordprocessors.
    > > >
    > > > The question is how to accommodate the above.
    > > >
    > > > Possible approaches:
    > > > --------------------
    > > > 1. Implement few "quirks" flags in FriBidi (the
    > > > FriBidiEnv structure has
    > > > room for those flags). When set, those flags
    > > > would cause FriBidi to
    > > > behave like one of the incompatible BiDi
    > > > algorithms.
    > > > 2. Add to FriBidi special code, which would flag
    > > > the characters in a
    > > > string, which would be reordered differently
    > > > in the different BiDi
    > > > algorithms. The wordprocessor would then
    > > > display those characters in a
    > > > different way, and let the user determine
    > > > whether they were reordered
    > > > the way he intended to or not.
    > > > 3. The import filters, used by the Free
    > > > wordprocessors to import files
    > > > created by a given version of Microsoft
    > > > wordprocessor, would insert
    > > > explicit BiDi overrides at those places where
    > > > the BiDi algorithms would
    > > > produce different visual orderings of
    > > > characters.
    > > >
    > > > Of the possible approaches, I favor the most the
    > > > "quirks" flags approach.
    > > > When importing a document, the user has anyway
    > > > to specify which wordprocessor created it. It
    > > > is not a big deal to specify also the
    > > > corresponding BiDi algorithm version.
    > > >
    > >
    > > Actually you don't need to do this for AbiWord. It
    > > detects file type automatically. I don't know if
    > > wv (our MS Word import library) could be made to
    > > also detect which version of MS created it. If so
    > > the user would not need to enter this information.
    > >
    > > Dom, is this true? IS there a MS Word version in
    > > the *.doc format?
    >
    > I prefer the "Quirks" approach suggested by Omer.
    > MSWord's FIB (file information block) tells you
    > which version of the Microsoft Word format the
    > document is saved in. We can faithfully identify
    > versions 2 through 9 at the moment.

    I think the first and most important step here is to
    build a very full and descriptive list of all the
    quirks, and hopefully decide what a 100% quirk-free
    bidi algorithm would be. Is the Unicode bidi
    algorithm
    perfect if everybody implemented it perfectly?
    Is everybody going to agree on what it should be? I
    hope so. It sounds like each quirk should be assigned
    a flag which freebidi can query and import/export code
    should set these flags according to the format
    (hopefully we'll not have to take in user prefs for
    things like this if there is truly only one right
    way).

    I think I've seen these kinds of quirks in Hebrew with
    hyphens connecting Hebrew endings to English words.
    I got varied results between different Unicode
    editors,
    mozilla, and IE; and more variation depending on
    whether I used the ASCII hyphen or the special Hebrew
    hyphen. I think World.abw contains this example.

    Is this one of the known quirks?

    Andrew Dunbar.

    > Dom
    >

    =====
    http://linguaphile.sourceforge.net/cgi-bin/translator.pl http://www.abisource.com

    __________________________________________________
    Do You Yahoo!?
    Everything you'll ever need on one web page
    from News and Sport to Email and Music Charts
    http://uk.my.yahoo.com



    This archive was generated by hypermail 2.1.4 : Sun Sep 22 2002 - 00:49:02 EDT