Re: Encoding issues

From: Andrew Dunbar (hippietrail@yahoo.com)
Date: Sat Nov 02 2002 - 22:20:41 EST

  • Next message: Andrew Dunbar: "Re: Encoding issues"

     --- Christian Biesinger <cbiesinger@web.de> wrote:
    > Hi,

    Hi Christian.

    > so you may remember that some time ago, I checked in
    > a patch to change the encoding that AP_DiskStringSet
    > uses to whatever XAP_App::getDefaultEncoding uses
    > (or something like that, can't remember what exactly
    > I did :) ).
    >
    > Anyway, it looks like this broke non-US-ASCII
    > characters in the statusbar, because of this piece
    > of code in ap_StatusBar.cpp, line 493, in
    > AP_StatusBar::setStatusMessage(const char * pBuf,
    > int redraw)
    > UT_UCS4_strcpy_char(bufUCS,pBuf);
    >
    > That function just uses the encoding that the
    > default constructor of mbtowc thinks is good as the
    > source encoding. That seems to be ISO-8859-1 for me.
    > However, due to the patch I mentioned above, that
    > string is already in UTF-8.
    > This means that the statusbar will not display
    > special characters (like, but not limited to, german
    > umlauts) correctly. Instead, it will show characters
    > looking like undecoded UTF-8 (like Ì)
    >
    > So... the question is:
    > What's the best way for fixing this?
    > Should UT_UCS4_strcpy_char take an additional (maybe
    > optional) argument, specifying the charset to
    > convert from? AP_StatusBar would pass the result of
    > XAP_App::getDefaultEncoding to it, and this would
    > work...

    There is only 1 way to fix this. We are software
    engineers here. Guesswork is not and has not ever
    been an integral part of what engineers do. What we
    should do is *find out* the *correct encoding* for
    the destination we send a string to, *always*, and
    use that encoding. I've said it before and I'll say
    it again, having a default constructor for mbtowc and
    wctomb is just begging for bugs. There should never
    be a time when we convert an encoding without knowing
    what encoding we want. Would you go to a money
    changer without knowing what currency or exchange rate
    you want? How on earth we're supposed to do better
    than Microsoft when we leave these things open to
    chance time after time is completely beyond me.
    So maybe some people think encoding is a hard problem
    -
    in that case look through the code or ask on the list
    before making code and committing it when it's all
    based on guesswork.

    Sorry I got into a rant (:

    Now the encoding needed by the status bar will depend
    on the OS. There should be functions in the
    EncodingManager these day to give the encoding of the
    OS and the encoding of the GUI. I think the GUI
    encoding is currently covered by something like
    defaultSystemEncoding. Experience on XP code has
    shown that the user often can set an encoding for
    himself. On Unix this is via $LANG environment
    variable. The system will usually have an encoding
    it likes to use for its own stuff. This varies from
    system to system. On QNX, BeOS, and OS X this seems
    to be UTF-8. On Windows this can be set in the
    Control Panel right next to where the user can set
    his preferred locale. There are APIs to get both.
    In a Win32 Unicode build (which we don't yet support
    but which we need), this will always be UCS-2 or
    UTF-16.

    With the old Gnome and GTK, the GUI used an ISO
    encoding, maybe depending on the default language.
    With the new Gnome and GTK, the GUI *always* uses
    UTF-8. So the statusbar also must use UTF-8.
    Perhaps it is now a good idea to add a new GUIEncoding
    to the other encodings in the EncodingManager to make
    it more obvious which one to use - especially since
    it appears with new GTK/Gnome that it may be
    different from the system encoding.

    Sorry for grumbling. We still have encoding problems
    popping up relatively frequently and also have wrong
    fixes going in fairly often. I hope this has gone a
    little way toward clearing up some of the confusion
    and should at least shed light on solving this one
    immediate problem.

    Andrew Dunbar. Mr i18n (:

    > Other ideas?
    >
    > (Should I put this in bugzilla instead?)

    =====
    http://linguaphile.sourceforge.net/cgi-bin/translator.pl http://www.abisource.com

    __________________________________________________
    Do You Yahoo!?
    Everything you'll ever need on one web page
    from News and Sport to Email and Music Charts
    http://uk.my.yahoo.com



    This archive was generated by hypermail 2.1.4 : Sat Nov 02 2002 - 22:29:18 EST