Locale choice


Subject: Locale choice
From: Tim Allen (tim@proximity.com.au)
Date: Thu Mar 01 2001 - 18:10:55 CST


Paul's recent post about various corner conditions in locale choice got me
wondering whether it might be feasible to implement a more sophisticated
locale fallback mechanism. Simply going on either the language code or the
country code doesn't cover all cases, and doesn't cover the obvious order
of preference many users would have.

In the following, I haven't done a thorough job of researching exact
language and country codes, so forgive me if I get a few wrong. I'll spell
out the country and language name so there is no ambiguity.

One simple example, my native locale is en_AU (English as spoken in
Australia). If I was using some software that didn't support my particular
locale, then my preference for a fallback would be, in order of my
preference, en_NZ (New Zealand English - the few minor vocab differences,
such as jandals instead of thongs, don't tend to occur much in software
:)), en_GB (British English), en_CA (Canadian English), en_US. Come to
think of it, en_ZA and even en_EI (Irish English) would probably go in
there somewhere ahead of en_US. BTW, I don't want to start any sort of
nationalistic flame war, the order here is purely by how convenient that
particular variety of English is for me. I guess that a New Zealander
would have a similar list to me, except with en_NZ and en_AU swapped. And
presumably for each of the above nationalities one could construct a
similar ordered list. There may be risks in assuming that everyone in a
particular country has the same preference order, but I think one could
produce an adequate generalisation that would get it right most of the
time.

For French speakers, I presume that a similar concept applies, ie a
Quebecois would prefer fr_CA, could cope happily with fr_FR and would use
fr_CH (Swiss French) at a pinch. And maybe one or another African
French-speaking locale would be acceptable, even though it's not all that
likely that one of those would exist and fr_FR not.

So far, it's just a matter of checking all the other locales with the same
language code, in a particular order.

Now how about Indonesian and Malay. Indonesian (code id) is similar enough
to Malay (code my, iirc, certainly different from id) that I'm sure a
speaker of one language would be adequately comfortable using software
written in the other. Malay is spoken in Malaysia, Singapore, Brunei, and
a dialect thereof is spoken in southern Thailand (don't know what the
language code is, don't really know how close to Malay it is
either). Indonesian is spoken in Indonesia, obviously, but also in
Surinam.

So I speculate that a Malay speaker in Singapore would prefer, in order,
my_SG, my_MY, my_BN (or whatever Brunei is), my_TH, id_ID, id_SN
(Surinam). And an Indonesian speaker in Indonesia would prefer id_ID,
id_SN, my_MY, my_SG, my_BN, my_TH.

So this is trickier than just matching language codes, or matching country
codes; you have to be able to relate together similar languages.

Let's get even more ambitious. A Javanese speaker in Indonesia (I guess
the language code for Javanese is jw, but don't know) would obviously
prefer jw_ID. There are Javanese speakers in Surinam, so jw_SN might
do. And of course, all Indonesians schooled since independence (ie aged 60
or less, roughly) were taught Indonesian at school, so there is every
chance that id_ID would also be an adequate fallback, and then we get back
into the whole my_* chain of preference as well.

Another example, nn_NO, nb_NO. Presumably these are adequate fallbacks for
each other. I'm told that Norwegian, Swedish and Danish are similar enough
to be mutually intelligible, so perhaps a fallback chain that involves
these other languages would also be sensible.

Apologies for the long diatribe on languages. The point is that there if a
certain body of knowledge can be encoded into the software, it can make
reasonably intelligent choices for locale fallbacks. Simply looking at
either the language code or the country code and doing string
matching therewith doesn't really do it.

One option would be to model a chain of preference for each possible
locale; this would be a lot of data. Another option might just be to
support groups of candidate substitutes, maybe at particular levels, and
offer choices to the user. Eg, for en_* locales, all other en_* locales
are level 1 candidates. For id_* locales, all other id_* locales are level
1 candidates, all my_* are level 2 candidates. For jw_* locales, id_*
locales are level 2 and my_* are level 3. The logic of the user choice
would be to give the user a choice from all the level 1 options. If there
are none, then offer the level 2 options. And if there still aren't any,
dredge up the level 3's, etc.

I'm not sure how much of this is a job for the application (eg
AbiWord) and how much is a job for the i18n/l10n mechanism in the
operating system.

Anyway, I doubt that anyone (including me) is likely to get around to
actually implementing anything like this anytime soon, but I wanted to
throw the idea out to see if there is any more interest.

Tim

-- 
-----------------------------------------------
Tim Allen          tim@proximity.com.au
Proximity Pty Ltd  http://www.proximity.com.au/
  http://www4.tpg.com.au/users/rita_tim/



This archive was generated by hypermail 2b25 : Thu Mar 01 2001 - 18:11:02 CST