EncodingManager usage note

From: Andrew Dunbar (hippietrail@yahoo.com)
Date: Fri Jan 24 2003 - 21:56:16 EST

  • Next message: Andrew Dunbar: "Re: EncodingManager usage note"

    I just wanted to warn those of us who may need to use
    certain EncodingManager methods about a slight problem
    I've noticed.

    To get the names of certain encodings used by the
    system, for use with iconv there are a few methods
    provided:

    getNativeEncodingName()
    getNativeSystemEncodingName()
    getNative8BitEncodingName()
    getNativeUnicodeEncodingName()

    I've noticed that some code treats the second two as
    though they are mutually exclusive. This is not the
    case at all. As noted in the comment/documentation
    for getNative8BitEncodingName() - it is perfectly
    okay for it to return UTF-8 or any multibyte CJK
    encoding - the only requirement is that the encoding
    is a superset of ASCII.
    Some code is calling this function to get an
    ISO-8859-x encoding when the native encoding is UTF-8.
    I believe there are some very subtle bugs which may be
    due to this.

    The correct fix is to add yet another method:
    getNativeNonUnicodeEncodingName()

    which will never return UTF-8 on *nix,BeOS,QNX, or OSX
    and will never return UCS-2 or UTF-16 on Windows.

    There is a slight and subtle semantic overlap and I
    apologize for the confusing nature of this. I'd like
    to implement this myself right now but my internet
    cafe bill today is already astronomical ):

    So just be careful with encodings and keep up the
    good work!

    Andrew Dunbar.

    =====
    http://linguaphile.sourceforge.net/cgi-bin/translator.pl http://www.abisource.com

    __________________________________________________
    Do You Yahoo!?
    Everything you'll ever need on one web page
    from News and Sport to Email and Music Charts
    http://uk.my.yahoo.com



    This archive was generated by hypermail 2.1.4 : Fri Jan 24 2003 - 21:59:16 EST