Re: how should we localize locale names?


Subject: Re: how should we localize locale names?
From: Paul Rohr (paul@abisource.com)
Date: Fri Mar 16 2001 - 18:26:41 CST


Tomas,

Please forgive me for overexplaining (again), but I just want to make sure
we're understanding each other.

At 10:48 AM 3/16/01 -0000, Tomas Frydrych wrote:
>> Paul:
>> I've thought all along that it'd be cool to be able to do any of the
>> following at run-time by choosing the appropriate menu:
>>
>> - change the UI locale
>> - change the default lang for a document
>> - change the lang of a specific portion of a document
>
>> Shouldn't these all have similar UIs with similar choices?
>
>To do this at runtime is fair enough, but (1) and (2-3) are only
>related by sharing similar font-related problems. So I think they do
>not need same UIs.

I'm not sure that most users would agree with you. The distinction between
languages and locales is pretty subtle. In both cases, I want to choose the
dialect of English or Deutsch that *I* speak.

When you're being asked the question, the answers seem very very similar
(and usually identical, in fact), so presumably the UI should be very
similar too.

If the answer is en-US or du-DE in both cases, I should be able to tell you
in the exact same way. *Both* dialogs should have a list which is either
localized:

  English - United States (en-US)
  German - Germany (du-DE)

or else globalized:

  English - United States (en-US)
  Deutsch - Deutschland (du-DE)

Having two dialogs which are otherwise identical (except for this) just
feels busted ... so much so, in fact, that if we decide that the "lang"
dialog should be localized (and I seem to be losing that debate), then I'd
probably prefer to retract my proposal for globalizing the "locale" dialog,
just so they match.

>(1) is about the environment in which AW runs;
>if the environment does not support a particular locale, that is the
>user's problem, and we will not try to run under such a locale,
>since we cannot.

Actually, we can guess (1) using the system locale, but users may prefer to
run the Abi UI using any other localization we provide. Currently, the only
way to switch them is to hand-edit the preferences file. We may need such a
dialog on launch, but it'd also be nice to switch them at run time.

For example, I have no idea whether my NT laptop supports Catalan or not,
but at least for a little while, I'd love to run AbiWord in Catalan. Ought
to make for some nice screenshots, no?

Thus, the set of choices here would be:

  - any locale that we have a translation/localization for,
  - unless the fonts are so screwed up that it looks bad.

>On the other hand, (2-3), or at least (3), is about
>capabilities of AW that are not dependent on the enviroment. If the
>user does not have suitable fonts to display Arabic text, we are still
>able to do everything that the lang property is for (providing we have
>the dictionaries available); basically I think the contents of the lang
>list should be static, it is about what AW can do.

I still don't see the difference. "What AW can do" is very much a function
of some very different things:

  - which platform you're running on
  - which fonts you have installed
  - whether we're taking full advantage of those platform/font services
  - which localizations you have installed
  - which dictionaries you have installed

The first two are system issues, and they're far more likely to affect you
negatively than the latter two Abi-specific issues.

Two specific examples:

1. Abi OK, platform busted
---------------------------
We have localizations for zh-TW and zh-CN, but AFAIK, they've only been
tested on pretty customized Linux environments. I'd be really surprised if
they Just Worked on a fairly vanilla Win95 box.

Should those languages be listed in the "lang" dialog?

2. platform OK, no specific Abi support
----------------------------------------
Alternatively, Swahili is a Latin-1 language (I think), so we have platform
support to render the content, even though we don't currently have
dictionaries (for spell check) or localizations (for the UI).

Should that language be listed in the "lang" dialog?

in either case
--------------
Theoretically, I should be able to tag my documents with any known language,
whether I have support for that language installed or not.

For example, say someone sends me a Catalan document with an English passage:

  <abiword ... xml:lang="ca-ES">
  ...
  <p><c props="lang:en-US">The only senntence I know how to read.</c></p>
  ...
  </abiword>

AFAIK, we should be able to properly render and edit that document on all
our supported platforms. Even if I don't have the ca-ES dictionaries or
translation installed, I'd still like to be able to have that en-US
misspelling squiggled. I'm pretty sure we agree on this.

Should the behavior be any different if the rest of the document is in
zh-TW (which we support elsewhere). From your Arabic example below, I
assume your answer is also yes.

What if one of the two languages is Sioux, which we currently have no
support for? I know that the language-code is "sio", but I don't even know
what charset they use. Shouldn't the dialog allow me to tag content with
that language, too?

I suppose we could have our existing 25 translators each provide strings for
the 170 languages in the Windows set, but even if Sioux is in there (and I
doubt it), Pierre will point out how many others we're still missing.

>The list could be dynamic, based on what languages we can
>display, in which case you could have them presented in their
>native form, but that does not remove the need for a translation of
>the language name to the language of the current locale. Say that
>a user does not have Arabic fonts, but she gets an English doc
>with some limited Arabic in it for the purposes of proofreading the
>English. She does not want to display the Arabic, which she
>cannot read anyway. She will make some changes to the English
>and then decide to spellcheck the document. What should happen
>when the spellchecker gets to the Arabic portions?

Good scenario.

>(1) If we have
>an Arabic dictionary (and presumably we will be shipping
>dictionaries with AW), we can spellcheck the Arabic; we will run
>into difficulties if there are errors, but could at least let her know
>that there are errors, and she can pass let the author know, but
>that is not, IMO, very good approach.

I'm not sure I agree. What's the harm in putting squiggles under misspelled
content in other languages?

If we allow her to read the document at all (and of course, we should), then
the Arabic portions will probably look like line noise, right? That's ugly,
but irrelevant. If she doesn't read Arabic, it doesn't matter whether the
fonts are available or not -- in both cases, she won't be able to read it,
and she won't be able to correct spelling mistakes.

>(2) We could silently ignore
>the Arabic, but that is not a good idea, because the user may in
>fact assume that spellcheking means spellchecking everything.

Yep. She probably would.

However, I think we should turn the UI problem around. It's only marginally
useful (at best) to have spell-checking turned on for languages you don't
speak. It's not like you can correct them accurately anyway.

Thus, if there are multiple languages in a document, it might be nice to
have a dialog listing them all, so that you could toggle *off* spell
checking support for langauges you don't speak and can't fix.

Or, better yet, instead of a document-specific option, it's probably even
cleaner to have an app-wide option to only *enable* dictionaries for the
languages you *do* speak.

If we set that expectation, and people agree that it's reasonable, then we
could safely and silently ignore *any* content that we don't find an enabled
dictionary for.

>Thus, irrespective of what and
>how gets displayed in the lang listbox, we need to be able to
>translate the lang property into the language of the interface,
>precisely because we need to be able to refer to the language
>when we *cannot* display it in its native shape. We could of course
>provide a romanised name of the language, but imagine the
>ugliness of the message, if it is in a non-roman alphabet locale,
>say Russian ... It is this ugliness that I am against, because it
>would give us a name for slopiness.

OK. I think I get it now. :-) Your big issue is to make sure that the
interface is never ugly.

Specifically, this means keeping the names of those locales or languages
from being displayed in the UI *in* their own language, unless we have the
proper font & charset support to do so cleanly.

>> I agree that we want to deal with the ugliness. The relevant questions
are
>> where and how. There are two places we can run into charset issues:
>>
>> - ugly interface ... unable to display localized menus, dialogs, etc.
>> - ugly content ... unable to cleanly render portions of the document
>
>ugly interface is not an option I am prepared to consider; either we
>can support given locale, and then we can have a nice interface, or
>we cannot, and then we should not pretend we can :-).

When changing UI locales, this means preventing people from switching to
locales that won't work. We'll have to know to prune them from the combo
box at run-time, to prevent them from switching into a gibberish language
and never be able to get out.

Note that this pruning means that all the locales left *could* be displayed
in their native languages.

>Ugly
>content is a different issue, because documents can contain
>languages that the current locale cannot handle; we display the
>lovely circles, and hope the user gets the hint and moves to utf-8 :-).

When changing content languages, however, you *are* willing to let them
choose languages which won't render properly (in the UI or the document).
This makes sense.

I think our only difference is that in addition to the following pretty
list:

  English - United States (en-US)
  Francais - France (fr-FR)
  Deutsch - Deutschland (du-DE)

I'm also willing to tolerate intelligibly ugly names for ugly content:

  #$^%# - #$^% (zh-TW)

And you're not. :-)

In fact, I think the dialog might need an Add button, too, so that users can
add a new language description for ones we don't yet support (for example,
sio-US).

bottom line
-----------
If I haven't convinced you yet (and I doubt I have), then it's time for me
to start switching gears and figure out how to make your "translate 'em all"
proposal work.

But I'll save that for another day. ;-)

Paul

PS: Sorry for being so verbose. I didn't get much sleep last night (wonder
why?) and I want to stop re-explaining myself.



This archive was generated by hypermail 2b25 : Fri Mar 16 2001 - 18:53:12 CST