Re: how should we localize locale names?


Subject: Re: how should we localize locale names?
From: Paul Rohr (paul@abisource.com)
Date: Sat Mar 17 2001 - 14:16:34 CST


Now we're really getting somewhere! :-)

At 10:18 AM 3/17/01 -0000, Tomas Frydrych wrote:
>Except that AW UI might be localised into 25 languages, but we
>can support 170 languages for the content at the same time. The
>number of languages available for the UI will probably never be the
>same as those for the content, because the content language
>becomes meaningful as soon as you add a dictionary.

It might help here to distinguish the following:

  A. locales we have a UI localization for ... approx 25
  B. locales with a working dictionary ... maybe a dozen?
  C. languages we can edit & display ... heading towards 170

Note that all three can vary independently, and that it's a *lot* easier to
add another localization (A) than it is to create a good dictionary (B).
Moreover, the set that grows fastest is (C), because for each charset added,
we immediately gain a whole batch of editable locales.

In short, (C) >> (A) >> (B).

>Also, the
>content language is outwith the control of the user, since one might
>be sent a document by someone else.

Exactly. We shouldn't ever assume that (B)==(C).

>Basically I think that for each language we have AW UI localised
>into we should also provide a dictionary, so that if you choose
>Catalan for your UI, you should be able to spellcheck Catalan too.

Agreed. (A)==(B) is a useful special case. Speakers of just about any
language would find it ideal if everything (UI, spellcheck, content) all
Just Work for their locale, preferably without ever seeing *any* of the
dialogs in question.

This doesn't mean we'd refuse to ship (A) without (B), of course. :-)

>On the other hand, if we have dictionaries for Sioux available but no
>translation of the UI, we should still add Sioux to the content
>language list (but obviously not to the UI list). If we do not have
>dictionaries, we should not list the language, because we cannot
>spellcheck it; the user will simply have to select the pseudo-
>language "no proofing" to aviod red squiggles everywhere.

Right now we ship AbiWord with *all* known localizations (A = 25), but only
a single dictionary (B = 1), even though it could conceivably edit a much
wider range of content (C < 170).

I've been assuming that we do want a way for people working in languages in
the sizable (C - B) subset to be able to indicate what lang their content
is, *even though* there's currently no way to spell check it.

As far as the UI's concerned, shouldn't the two choices be orthogonal?
Making someone say their language is "none" just to keep it from getting
spellchecked sounds like a design flaw, and a fairly rude one.

Instead, I should be able to have a setup where:

  - I create content in the 4 languages I speak,
  - 2 of which are spellchecked, because I have dictionaries for them,
  - ignoring 15 other installed dictionaries for languages I don't speak.

If that's desirable, then we could support it as follows:

  - have one wide-open locale-selection box for tagging lang
  - have another dictionary-selection box for enabling specific langs

Content in *any* language without an installed, enabled dictionary would be
skipped by the spellchecking process.

>It is not the squiggles I am worried about (by all means let's put the
>squiggles in), but running the spellchecker interactively, because
>we will not be able to display the mispelled words and the
>alternatives. She would not see noise, but just the nice circless we
>use for chars we cannot display something like:
>
>Not found: ooooo
>Alternatives:
> ooooo
> ooooo
> ooooo
>
>It is one thing to see the circles in the doc, after all, she insisted
>on opening it, but we should not bother her with something so
>irrelevant and useless as the stuff above.

Oh yeah, the dialog. I never use it, which makes it too easy to forget.
Sorry.

I'm not sure that this corner case will be very common -- how likely is it
that users will have and want to use content *and* dictionaries for a
language they can't see or speak?

Still, for the sake of completeness, we should handle this cleanly. I'm
assuming that this particular case is pretty easy to recognize at run-time,
no? If so, several potential solutions come to mind:

1. Just skip that occurence.

2. Do #1, but first throw up a localized warning dialog saying something
like:

  "Portions of this document are written in languages we can't display.
  None of those languages will be spell-checked."

Note that it probably doesn't matter at this point *which* language(s) are
being skipped, since there's not much they can do about it anyhow.

>> Or, better yet, instead of a document-specific option, it's probably even
>> cleaner to have an app-wide option to only *enable* dictionaries for the
>> languages you *do* speak.
>>
>> If we set that expectation, and people agree that it's reasonable, then we
>> could safely and silently ignore *any* content that we don't find an
enabled
>> dictionary for.
>
>That sounds like a very good idea to me, but you will still need to
>be able to deal with the case where the user will not set this option
>and gets a doc with language for which the fonts are not there.

By default, I think we should automatically disable dictionary support for
any languages we can't cleanly render on that platform.

>> OK. I think I get it now. :-) Your big issue is to make sure that the
>> interface is never ugly.
>> Specifically, this means keeping the names of those locales or languages
>> from being displayed in the UI *in* their own language, unless we have the
>> proper font & charset support to do so cleanly.
>
>Yes, now you can see right through me :-).

Yay! I *do* understand the goal.

>There is though one
>additonal problem with listing the languages in their own language:
>what order will they come in? Notably, where will you put Russian
>in relationship to Romanian; will/should/can they be near each
>other? And where will the CJK languages, Arabic, Hebrew go?
>Will/should Serbian (Cyrilic) and Serbian (Latin) be next to each
>other (same language, two alphabets)? This is difficult as is, but
>even more so because our UI is 8-bit, so that Russian 'R' will not
>be anywhere near English 'R', but will overlap with probably entirely
>unrelated letters in Hebrew, Arabic, Thai, etc.

Yep. There are (at least) two such problems here.

1. While I know the sort order for *my* language's names for all other
languages, I'm less likely to be able to find *their* name for their
language, because "Deutsch" shows up somewhere other than "German" would.

2. Charset issues could make problem #1 even worse, depending on which
locale's sort order is used.

I haven't given this much thought, but my off-the cuff suggestion would be
to *not* attempt to sort the list by name at all. Instead, why not sort by
the underlying lang codes (which I think we should always be displaying
anyhow)?

Thus, my usual example would sort as follows:

  Deutsch - Deutschland (du-DE)
  English - Canada (en-CA)
  English - United Kingdom (en-GB)
  English - United States (en-US)
  Francais - Canada (fr-CA)
  Francais - France (fr-FR)
  000000 - 000000 (he-IL)
  0000000 - 00000 (zh-TW)

Alternatively, I suppose we could sort by country:
  
  English - Canada (en-CA)
  Francais - Canada (fr-CA)
  Deutsch - Deutschland (du-DE)
  Francais - France (fr-FR)
  English - United Kingdom (en-GB)
  000000 - 000000 (he-IL)
  0000000 - 00000 (zh-TW)
  English - United States (en-US)

However, given our prior discussions about language being a much better
fallback than country, I suspect that we want similar languages grouped
together, rather than similar countries.

>On the UI list, this is less of a an issue, because the list is likely to
>be shorter, and we could in fact split it into several smaller lists
>that use the same alphabet, indicating what is going on by a few of
>letters from the alphabet above each sublist (and having one for any
>odd languages). In the content language list this is a problem, for a
>person using utf-8 locale (or win NT) may want all the langs we can
>handle listed, and under utf-8 we can handle pretty much anything.

Yep. Good GUI design skills will certainly be needed for the content
language dialog, because there could be a *lot* of relevant choices to wade
through.

At minimum, it should be a pretty tall dialog, so that scrolling through it
isn't that obnoxious.

Other ideas include Tim's AI pruning logic, or some other way to allow
people to filter out (or zero in on) a particular language or country. For
example, if we wanted a *really* snazzy dialog, we could implement an XP
preview widget to throw up a clickable map of the globe where you select a
continent, and only languages spoken on that continent are shown on the
list. :-)

In any event, the ultimate fallback will always be to show the full list in
all its glory, so we definitely want to do a good job there.

>It seems to me that the only way to provide easy to use interface in
>this case is to have the whole list properly localised.
>
>Please do not get me wrong, I do see the attractivity of a list where
>each language is displayed in its native alphabet, but I have serious
>doubts that we can get AW Just Work (to borrow your phrase)
>without having the ability to refer to each language we support in a
>localised manner.

You may be right that it's "the only way," but I'm still trying to come up
with GUIs that meet this criterion.

>> I think our only difference is that in addition to the following pretty
>> list: ...
>> I'm also willing to tolerate intelligibly ugly names for ugly content:
>>
>> #$^%# - #$^% (zh-TW)
>> And you're not. :-)
>Yes, that pretty much sums it up. Honestly, if you were using
>Word 2000 and a string like that (or just the ooooo) poped up at
>you somewhere in a dialogue, would you feel it Just Worked, or
>would get the pleasant feeling some of us get when they see that
>M$ screwed up?

They control both the platform *and* the product, so people's Just Works
criteria for them should be a tad higher. In fact, their usual way to avoid
this problem is to prune those "additional" languages from the list and
refuse to support them *at all* in their binaries. (That's also an option
for us, I suppose, but we're Open Source, so we can do better.)

Remember that the only time this problem arises is if the *platform* is
broken, or we aren't taking full advantage of it. I'm just not sure that
the Right Thing to do is totally hide how broken the platform is, when we'd
be fine otherwise. The *content* will be "broken" anyways as soon as you
start typing, so this just gives you some advanced warning.

To be honest, this is one place where I'd enjoy the marketing opportunity
offered by answering the FAQ it triggers. :-)

Paul,
the guy responsible for s_tellNotImplemented()



This archive was generated by hypermail 2b25 : Sat Mar 17 2001 - 14:08:59 CST