Re: Fwd: Unicode Private User Area conflict

From: Andrew Dunbar (hippietrail@yahoo.com)
Date: Sun Feb 17 2002 - 10:56:02 GMT

  • Next message: Michiel Toneman: "Re: Cut and Paste from KDE apps to AbiWord?"

     --- Anthony Fok <anthony@thizlinux.com> wrote: >
    Hello all,
    >
    > On Fri, Feb 15, 2002 at 11:42:02AM +0000, Andrew
    > Dunbar wrote:
    > > ----- Forwarded message from Fencol Yung -----
    > > > I found that the following Unicode Private User
    > > > character defined in
    > > > AbiWord 0.9.6 conflicted with our charset:
    > > >
    > > > UCS_FIELDSTART 0xe000
    > > > UCS_FIELDEND 0xe001
    > > > UCS_BOOKMARKSTART 0xe002
    > > > UCS_BOOKMARKEND 0xe003
    > > >
    > > > I read from the mail archives that these
    > character is actually move from
    > > > elsewhere. If I move them to other position to
    > resolve the conflict,
    > > > will that create any drawback? Actually what is
    > the purpose of those
    > > > special character?
    >
    > Yes, these four codepoints are at the very beginning
    > of the Unicode
    > PUA, and thus clash with at least 2 major charsets:
    > GB18030 and
    > BIG5-HKSCS. (GB18030: New simplified Chinese
    > encoding standard that
    > maps to Unicode one-to-one; BIG5-HKSCS: Extension to
    > the traditional
    > Chinese Big5 encoding, by the Hong Kong government.)
    >
    > As a matter of fact, we ran into the same problem
    > about a
    > month ago when our product was being certified for
    > GB18030 compliance
    > at the official Chinese Testing Agency. Three of
    > the GB18030 test
    > documents map to Unicode PUA U+E000 to U+E765.
    > Loading the first of
    > these 3 test documents would cause AbiWord to crash
    > immediately.
    >
    > We had to moved these {FIELD,BOOKMARK}{START,END}
    > out of the way (to
    > U+F000..U+F003 temporarily) in order to pass the
    > certification.
    >
    > But I agree that even putting them in U+F000..U+F003
    > is problematic.
    >
    > > Yes I moved them. I think we only had the first
    > two at that time.
    > > They are used internally by AbiWord. And are
    > assumed not to be
    > > imported by any document. This is not a good
    > assumption and we really
    > > need to redesign this part of the code IMHO to use
    > some kind of
    > > out-of-band data instead of overriding the
    > characters. Possibly we
    > > can find some true "never to be used" characters
    > but I doubt it. The
    > > people who know these parts of the code (please
    > grep for them) should
    > > be able to discuss the whys, wherefores, and
    > possible solutions in
    > > this list. Before I moved them there were in
    > conflicted with illegal
    > > and/or BOM codes from memory which messed with
    > importers and
    > > exporters and generally seemed like a bad idea.
    >
    > > Hope someone has a good idea to fix this. Merely
    > moving them around
    > > is probably going to keep breaking somebody's
    > private stuff here and
    > > there...
    >
    > I agree. Nevertheless, I think we do need to move
    > them now until a
    > better solution is found. The Unicode PUA is in
    > U+E000..U+F8FF.
    > The range U+E000..U+E765 is explicitly set as three
    > User Defined
    > Areas (UDAs) in the GB18030 standard, so we must
    > stay out of this
    > range, otherwise AbiWord would not comply with
    > GB18030 (mandatory in
    > Mainland China). (Yes, the mapping table goes
    > higher, but no one uses
    > anything that high yet, not even the GB18030 test
    > documents. :-)
    >
    > U+E000-U+F848 maps to the EUDC (End-User Defined
    > Characters) in the
    > CP950 / BIG5 / BIG5-HKSCS standard as compatibility
    > codepoints, and it
    > would be best to stay out of these ranges too.
    > There is no mapping
    > from BIG5-HKSCS to U+F849..U+F8FF.
    >
    > So, for now, U+F849..U+F8FF is free. I suggest
    > putting the four
    > AbiWord internal control codes in U+F850..U+F853 for
    > now. Yes, this
    > only solve the symptom, but it is important to have
    > this fix _now_ for
    > GB18030 and HKSCS compliance. This will do until
    > the real cure comes.
    > :-)
    >
    > A patch is attached. Thanks! :-)

    Can somebody please commit this ASAP (unless somebody
    sees a problem I can't). I think it's important.

    Andrew Dunbar.

    =====
    http://linguaphile.sourceforge.net http://www.abisource.com

    __________________________________________________
    Do You Yahoo!?
    Everything you'll ever need on one web page
    from News and Sport to Email and Music Charts
    http://uk.my.yahoo.com



    This archive was generated by hypermail 2.1.4 : Sun Feb 17 2002 - 06:00:00 GMT