Re: Commit: string classes rearranged

From: Andrew Dunbar (hippietrail@yahoo.com)
Date: Fri Jan 03 2003 - 21:18:15 EST

  • Next message: Andrew Dunbar: "Re: Commit: string classes rearranged"

     --- "j.m.maurer" <j.m.maurer@student.utwente.nl>
    wrote: > Op vr 03-01-2003, om 22:17 schreef Dom
    Lachowicz:
    > > I'll re-enable the UCS2 methods on the UCS4 and
    > UTF8
    > > string classes and simply remove the UCS2 string
    > class
    > > and other silly ucs2 functions.
    > >
    >
    > Great! That's all I need (and should be needed)

    Sorry for any trouble and/or confusion guys (:
    One of the main reasons I wanted to get rid of the
    UCS-2 stuff (or deprecate it at least) is that there
    is a *huge* amount of confusion between UCS-2 and
    UTF-16. They look compatible but they are not.
    In some major contexts the names are misused -
    especially by Microsoft who always call both UCS-2 and
    UTF-16 "UCS-2"! Java also seems to be in a grey area
    by calling it UCS-2 but nobody really knowing which
    encoding it in fact supports. C# seems to have
    followed in the footsteps of Java ):

    If you think you have a use for UCS-2, please do *all*
    you can to find whether you really need UCS-2 or
    UTF-16. The difference is that UCS-2 can only handle
    65,536 characters whereas UTF-16 can handle all that
    Unicode can handle. UTF-16 does this by encoding some
    characters as two 16-bit units. These are known as
    surrogate pairs. The area of Unicode covered by UCS-2
    is known as the BMP or Basic Multilingual Plane.
    Find out if WordPerfect supports non-BMP characters or
    not. If it does then we need new UTF-16 functions or
    a new UTF-16 class. (we don't really - just use the
    UCS-4 or UTF-8 classes and convert using UT_iconv in
    the places you need to).

    To find out whether WordPerfect uses UTF-16, Google
    should turn up some test documents or pages that you
    ought to be able to import into WordPerfect. Then
    save them as native WordPerfect files and reopen them.
    If they are identical then WordPerfect actually
    supports UTF-16 and not UCS-2!

    Using the two interchangeable can and will corrupt
    your
    data! Also note that in my experience AbiWord cannot
    yet handle characters outside the BMP at all, at least
    on Unix ):

    Hope this makes some sense (:

    Andrew Dunbar.

    =====
    http://linguaphile.sourceforge.net/cgi-bin/translator.pl http://www.abisource.com

    __________________________________________________
    Do You Yahoo!?
    Everything you'll ever need on one web page
    from News and Sport to Email and Music Charts
    http://uk.my.yahoo.com



    This archive was generated by hypermail 2.1.4 : Fri Jan 03 2003 - 21:21:45 EST