Re: Encoding issues

From: Andrew Dunbar (hippietrail@yahoo.com)
Date: Sun Nov 03 2002 - 21:36:34 EST

  • Next message: Mark Gilbert: "[Patch] (HEAD) make mg's non-xft build build"

     --- Christian Biesinger <cbiesinger@web.de> wrote:
    > Andrew Dunbar wrote:
    > > In this case we're converting
    > > from ISO-8859-X to UTF-8, then converting from
    > > ISO-8859-X to UTF-8 again, getting the source
    > > encoding wrong the second time. Argh! ):
    >
    > Yeah, this is what we're doing. (though the first
    > time, it might've been from utf-8 or whatever
    > the .strings file is in. anyway, that code is
    > correct).

    The strings file can be in any encoding but I believe
    Dom has had them or asked for them all to be changed
    to UTF-8. Regardless, the XML parsing code *always*
    returns the strings in UTF-8 so that callers never
    need to think about encodings.

    > Anyway, so the question is: What charset is the
    > string passed to setStatusMessage (the char*
    > version) in?

    On which platform? I'm sorry but I don't really have
    access to the source right now.

    > That function wants an UCS-4 string and has a char*.

    It would seem odd for an AbiWord GUI function to want
    UCS-4. I would've thought document functions would
    want UCS-4 and GUI functions would want UTF-8.
    Win32 Unicode functions want UCS-2/UTF-16 but that
    doesn't seem to be what you're asking.

    > If it is always UTF-8, that function could just use
    > UT_convert (I think that's what it's called, might
    > be UT_iconv, can't remember) from UTF-8 to UCS-4 and
    > everyone would be happy. Alternatively, if it's
    > always XAP_App::getDefaultEncoding() that would be
    > fine too, because that could be used instead of
    > UTF-8.

    Well as always I would recommend tracing through the
    code to find out the correct answer. But my
    assumption is that we are passed .strings values in
    UTF-8 and that XP GUI functions ought to take UTF-8
    since GTK2, QNX, BeOS, and OS X (and KDE), and Pango
    all use UTF-8 for their GUI strings. Windows is the
    exception and so ought to be handled in the Win32
    layer.

    This may not reflect the current code though.

    > Now, before I dig into the code, does anyone know
    > what encoding the strings passed to setStatusMessage
    > are supposed to be in?

    I don't know what they are but they should be UTF-8
    unless somebody can present a good argument otherwise.

    > -biesi, who really wishes Abiword would use a string
    > class for _all kinds of strings_ which also stored
    > the string's encoding.

    I actually started work on this way back when but
    there
    were just too many places where people were making
    strings without knowing or caring what encoding it was
    and the stored encoding name just ended up being wrong
    half the time. What we really need is one encoding
    to be used internally at all times, and converting it
    to clearly stated encodings at endpoints where various
    GUIs, APIs, etc need it.

    Andrew.

    > --
    > Fiat iustitia, pereat mundus.
    >

    =====
    http://linguaphile.sourceforge.net/cgi-bin/translator.pl http://www.abisource.com

    __________________________________________________
    Do You Yahoo!?
    Everything you'll ever need on one web page
    from News and Sport to Email and Music Charts
    http://uk.my.yahoo.com



    This archive was generated by hypermail 2.1.4 : Sun Nov 03 2002 - 21:45:16 EST