From: Andrew Dunbar (hippietrail@yahoo.com)
Date: Sun Nov 03 2002 - 21:36:34 EST
--- Christian Biesinger <cbiesinger@web.de> wrote:
> Andrew Dunbar wrote:
> > In this case we're converting
> > from ISO-8859-X to UTF-8, then converting from
> > ISO-8859-X to UTF-8 again, getting the source
> > encoding wrong the second time. Argh! ):
>
> Yeah, this is what we're doing. (though the first
> time, it might've been from utf-8 or whatever
> the .strings file is in. anyway, that code is
> correct).
The strings file can be in any encoding but I believe
Dom has had them or asked for them all to be changed
to UTF-8. Regardless, the XML parsing code *always*
returns the strings in UTF-8 so that callers never
need to think about encodings.
> Anyway, so the question is: What charset is the
> string passed to setStatusMessage (the char*
> version) in?
On which platform? I'm sorry but I don't really have
access to the source right now.
> That function wants an UCS-4 string and has a char*.
It would seem odd for an AbiWord GUI function to want
UCS-4. I would've thought document functions would
want UCS-4 and GUI functions would want UTF-8.
Win32 Unicode functions want UCS-2/UTF-16 but that
doesn't seem to be what you're asking.
> If it is always UTF-8, that function could just use
> UT_convert (I think that's what it's called, might
> be UT_iconv, can't remember) from UTF-8 to UCS-4 and
> everyone would be happy. Alternatively, if it's
> always XAP_App::getDefaultEncoding() that would be
> fine too, because that could be used instead of
> UTF-8.
Well as always I would recommend tracing through the
code to find out the correct answer. But my
assumption is that we are passed .strings values in
UTF-8 and that XP GUI functions ought to take UTF-8
since GTK2, QNX, BeOS, and OS X (and KDE), and Pango
all use UTF-8 for their GUI strings. Windows is the
exception and so ought to be handled in the Win32
layer.
This may not reflect the current code though.
> Now, before I dig into the code, does anyone know
> what encoding the strings passed to setStatusMessage
> are supposed to be in?
I don't know what they are but they should be UTF-8
unless somebody can present a good argument otherwise.
> -biesi, who really wishes Abiword would use a string
> class for _all kinds of strings_ which also stored
> the string's encoding.
I actually started work on this way back when but
there
were just too many places where people were making
strings without knowing or caring what encoding it was
and the stored encoding name just ended up being wrong
half the time. What we really need is one encoding
to be used internally at all times, and converting it
to clearly stated encodings at endpoints where various
GUIs, APIs, etc need it.
Andrew.
> --
> Fiat iustitia, pereat mundus.
>
=====
http://linguaphile.sourceforge.net/cgi-bin/translator.pl http://www.abisource.com
__________________________________________________
Do You Yahoo!?
Everything you'll ever need on one web page
from News and Sport to Email and Music Charts
http://uk.my.yahoo.com
This archive was generated by hypermail 2.1.4 : Sun Nov 03 2002 - 21:45:16 EST