Re: how should we localize locale names?


Subject: Re: how should we localize locale names?
From: Paul Rohr (paul@abisource.com)
Date: Thu Mar 08 2001 - 13:14:08 CST


At 08:40 PM 3/8/01 +1100, Tim Allen wrote:
>You beat me to it, Paul, as soon as I saw the dialog screenshot I started
>wondering about exactly this.

I couldn't wait for the screenshot, so I peeked at the code. Otherwise, I
would have been glad to let you beat me to it. I'm sending *entirely* too
much mail these days. ;-)

>The general idea is definitely good. But like Dom, I have slight
>reservations about adding an extra wrinkle to the way our string sets get
>used. This would imply that every language string set would be at least
>partially loaded.

Agreed. However, if we're ever going to transition to a world where *all*
locale-specific stuff gets loaded dynamically at run-time -- ie toolbars and
menus, too -- then something like this is unavoidable.

For this dialog to Just Work in that kind of situation, you need to do a
dynamic scan of some directory or index at run-time to see what locales you
have available.

All I was suggesting was that we augment the header of the strings files so
we could quickly get in, grab this string, and get out. (For details, see
below.) We certainly wouldn't want to keep the whole @#$%^&* thing in
memory. Gack.

>I still like the idea of moving to gettext at some point
>in the medium-term future; I actually would like to implement that myself,
>except for minor impediments like work, moving house, a wife and son etc
>etc etc :-).

Ooooh. You might be the first person who's expressed interest in actually
*doing* the work to fix gettext for other platforms, instead of just
complaining about it. Be careful -- this could make you very very popular.
;-)

However, you might also want to look at what Eazel's been doing with XML
i18n tools. I have *no* idea what that's for, but if they're moving away
from gettext, that might be worth paying attention to.

>Adding more cruft to the existing model would seem to make
>such a transition more difficult. Is there some other paradigm we can use?
>We certainly don't want to have to do anything as silly as temporarily
>switching locales to resolve the language name, then switching back.
>
>Maybe we want a list of non-translatable strings somewhere, defined in
>such a way that it's very easy for a new translator to add the name of the
>new language to the list.

You could certainly do that. A static index would be faster to parse than
scanning a directory at run-time to extract the same information from
whatever translations are available. Of course, it could also be out of
sync, but it *is* faster.

Unless someone is willing to do the up-front work to generate a global
lookup list of UTF8 locale names in the matching languages, I assume that
we'd want each translator to add their own lookup.

Hardwiring this list into the binary would be fairly translator-hostile, so
I expect you'd want to add another installable file to the binary distro.
If so, here's a minimal XML proposal:

-- snip --
<?xml version="1.0">
<! -- some comment explaining how to get UTF8 characters into this file -->
<locales
en-US="English -- United States"
fr-FR="Français -- France"
du-DE="Deutsch -- Deutschland"
...
/>
-- snip --

I still think that augmenting the existing strings files to add one more
attribute as follows is simpler and more reliable:

-- snip --
<AbiStrings app="AbiWord" ver="1.0" language="du-DE" label="Deutsch --
Deutschland">
-- snip --

However, I've anted up for a lot more than my $.02 on this, so I'll be quiet
now.

>Dom's other point is also sensible, in that you may not have any fonts on
>your system capable of displaying the names of, eg Nihon go and Mandarin,
>in their native character sets. I suppose the previous argument holds, ie
>if you don't have the right fonts then chances are you're not planning to
>use them in your document.

Yeah. That's the trickery I was worried about.

To be clear, though. For this user in this situation, we really *don't*
support those languages in any meaningful way. My first reaction would be
to just display the naked lang tag -- ie, (zh-TW) -- to indicate that either:

  - we don't have an appropriate localization to identify it, or
  - it's not currently usable (due to fonts issues, say).

We'd need to do something like this anyway for users who receive a document
that got tagged with a locale that they don't have installed.

Over the long term, this feels like the Right Thing to do. If they want
zh-TW support, but they don't have it installed, they need to go get an
appropriate zh-TW "language pack" which might include some or all of the
following:

  - string-like stuff to localize the UI
  - fonts to view the content
  - dictionaries, etc. to clean up the content
  - other locale-specific defaults
  - help in that language
  - etc.

We don't have such a solution now, but I think people generally agree that
we're likely to head in this direction. Eventually.

>But it would be nice to show off the languages
>we support, and ugly if the language choice dialog shows random gibberish.

Oh, it's definitely ugly, but at least it's clear that their text would look
like that too. ;-)

I have to admit that I've briefly flirted with the idea of creating and
shipping a single Unicode font with just enough codepoints to render the
text in this dialog. Talk about hacks!

Of course, then users would wind up with the expectation that *choosing*
that language would also work, when it wouldn't. So much for that idea.
Violating expectations you've gone to the trouble of setting is a great way
to piss people off.

>To do that we need not language-localised language names, but
>character-set localised names (in practice, Romanised names would do, I
>think, as in eg "Nihon go" for Japanese). And then some way of detecting
>that we can't display the native names, and using the romanised names
>instead.

This sounds like another promising alternative (for people who don't like
the naked tag approach described above). Can each of our platform font APIs
tell us when a string will get rendered using slugs or garbage instead of
"real" glyphs?

>This doesn't seem to fit all that nicely with the existing localisation
>paradigm, nor with any likely paradigm that would be supported by gettext.
>Pity. More thought required, I think.

I don't know about gettext, but for the current paradigm, you could just as
easily add two labels to the strings file:

  language="ja-JP" label="#$^#^#^" romanized="Nihon go"

... where the first looks like the correct line noise (sorry for the bad
impersonation), and the second is the Latin-1 romanized equivalent.

>Dreamer? I thought your points seemed reasonably down-to-earth and
>pragmatic :-).

Why thank you. I try. :-)

Paul



This archive was generated by hypermail 2b25 : Thu Mar 08 2001 - 13:06:42 CST