From: Andrew Dunbar (hippietrail@yahoo.com)
Date: Sun Feb 17 2002 - 10:56:02 GMT
--- Anthony Fok <anthony@thizlinux.com> wrote: >
Hello all,
>
> On Fri, Feb 15, 2002 at 11:42:02AM +0000, Andrew
> Dunbar wrote:
> > ----- Forwarded message from Fencol Yung -----
> > > I found that the following Unicode Private User
> > > character defined in
> > > AbiWord 0.9.6 conflicted with our charset:
> > >
> > > UCS_FIELDSTART 0xe000
> > > UCS_FIELDEND 0xe001
> > > UCS_BOOKMARKSTART 0xe002
> > > UCS_BOOKMARKEND 0xe003
> > >
> > > I read from the mail archives that these
> character is actually move from
> > > elsewhere. If I move them to other position to
> resolve the conflict,
> > > will that create any drawback? Actually what is
> the purpose of those
> > > special character?
>
> Yes, these four codepoints are at the very beginning
> of the Unicode
> PUA, and thus clash with at least 2 major charsets:
> GB18030 and
> BIG5-HKSCS. (GB18030: New simplified Chinese
> encoding standard that
> maps to Unicode one-to-one; BIG5-HKSCS: Extension to
> the traditional
> Chinese Big5 encoding, by the Hong Kong government.)
>
> As a matter of fact, we ran into the same problem
> about a
> month ago when our product was being certified for
> GB18030 compliance
> at the official Chinese Testing Agency. Three of
> the GB18030 test
> documents map to Unicode PUA U+E000 to U+E765.
> Loading the first of
> these 3 test documents would cause AbiWord to crash
> immediately.
>
> We had to moved these {FIELD,BOOKMARK}{START,END}
> out of the way (to
> U+F000..U+F003 temporarily) in order to pass the
> certification.
>
> But I agree that even putting them in U+F000..U+F003
> is problematic.
>
> > Yes I moved them. I think we only had the first
> two at that time.
> > They are used internally by AbiWord. And are
> assumed not to be
> > imported by any document. This is not a good
> assumption and we really
> > need to redesign this part of the code IMHO to use
> some kind of
> > out-of-band data instead of overriding the
> characters. Possibly we
> > can find some true "never to be used" characters
> but I doubt it. The
> > people who know these parts of the code (please
> grep for them) should
> > be able to discuss the whys, wherefores, and
> possible solutions in
> > this list. Before I moved them there were in
> conflicted with illegal
> > and/or BOM codes from memory which messed with
> importers and
> > exporters and generally seemed like a bad idea.
>
> > Hope someone has a good idea to fix this. Merely
> moving them around
> > is probably going to keep breaking somebody's
> private stuff here and
> > there...
>
> I agree. Nevertheless, I think we do need to move
> them now until a
> better solution is found. The Unicode PUA is in
> U+E000..U+F8FF.
> The range U+E000..U+E765 is explicitly set as three
> User Defined
> Areas (UDAs) in the GB18030 standard, so we must
> stay out of this
> range, otherwise AbiWord would not comply with
> GB18030 (mandatory in
> Mainland China). (Yes, the mapping table goes
> higher, but no one uses
> anything that high yet, not even the GB18030 test
> documents. :-)
>
> U+E000-U+F848 maps to the EUDC (End-User Defined
> Characters) in the
> CP950 / BIG5 / BIG5-HKSCS standard as compatibility
> codepoints, and it
> would be best to stay out of these ranges too.
> There is no mapping
> from BIG5-HKSCS to U+F849..U+F8FF.
>
> So, for now, U+F849..U+F8FF is free. I suggest
> putting the four
> AbiWord internal control codes in U+F850..U+F853 for
> now. Yes, this
> only solve the symptom, but it is important to have
> this fix _now_ for
> GB18030 and HKSCS compliance. This will do until
> the real cure comes.
> :-)
>
> A patch is attached. Thanks! :-)
Can somebody please commit this ASAP (unless somebody
sees a problem I can't). I think it's important.
Andrew Dunbar.
=====
http://linguaphile.sourceforge.net http://www.abisource.com
__________________________________________________
Do You Yahoo!?
Everything you'll ever need on one web page
from News and Sport to Email and Music Charts
http://uk.my.yahoo.com
This archive was generated by hypermail 2.1.4 : Sun Feb 17 2002 - 06:00:00 GMT