Re: multi-lingual support under Unix


Subject: Re: multi-lingual support under Unix
From: Karl Ove Hufthammer (huftis@bigfoot.com)
Date: Sat Dec 30 2000 - 12:21:10 CST


----- Original Message -----
From: "Tomas Frydrych" <tomas@frydrych.uklinux.net>
To: <abiword-dev@abisource.com>
Sent: Saturday, December 30, 2000 6:24 PM
Subject: Re: multi-lingual support under Unix

> Adding Language to the formating is straight forward, you just need
> to add character property 'language' to list in pp_Property.cpp and
> create some interface for the user to use this property.
>
> > But we also need a *document* language (taken from the current locale (when
> > creating a new document)). This is the language "for the whole document".
You
> > can have sections, paragraphs, quotes &c. in different language, but only
*one*
> > document language. Things this could be used for:
> >
> > * Sorting (different language use different sorting rules, but the sorting
rules
> > are document-wide)
> > * Default typographically correct features (e.g. in Norwegian documents, the
> > default list marker should be an en-dash, not a bullet)
> > * Default date&time field/language of date&time field
> > * Smart-quotes (not sure if this should be document-wide or change when you
> > change language of a section -- perhaps it's language-dependant!)
>
> Actually, it seems to me that all of these are poentially language
> dependent.

Potentially, yes.

> If I have an English document with a sorted table
> containing some Hebrew stuff, I definitely do not want to use
> English sorting order on that table,

Since English doesn't contain any Hebrew character, this shouldn't make a
difference (Hebrew characters should always be sorted according to Hebrew
sorting rules). But if you have for example a Bibliography (References list),
where the names of authors were written in different langauges (an English
author typically has an English name), you would use *one* sorting algorithm for
the whole list (that of the document language). (It makes no sense to sort a
list if each list item has different sorting rules!)

> nor do I necessarily want to use
> default English bulets in Hebrew sections;

Three points:

* This is only the *default* bullets. When you open the bullets dialog, English
bullets should be selected as default in the English document. You can always
change it to something different (just as you already can now).

* This really hasn't much with what you *want*, but what's correct typography
(in a specific language).

(I have no idea what Hebrew uses for bullets -- perhaps these should be used
instead of English ones.)

(It may seem that all this would complicate things for the user, but it
wouldn't, on the contrary, actually. The French church secretary f.i., wouldn't
need to look up the rules on the finer points of French typography; when they
write French documents in AbiWord, they would *automatically* get correct
bullets, French quotation marks (like « this » (where the space between the word
and «/» is slightly narrower than a normal word space).)

> same goes for Date and
> Time fields, if I decide that I want a field formated in a different
> language than the document language, then I would also expect it
> to use the conventions of that other language.

Again, this is only for the default format. Actually, I think a language list
box in the Date and Time field will be necessary/useful.

> Smart quotes are
> also language depended, for instance in Czech the opening quote
> hangs under the line,

"under the line"? Something similar to double commas? ,,like this'', or
something more sophisticated

> and if I have an English document with a
> Czech section within which I will have quotes, I do not want them
> to be English quotes.

No, you wouldn't, since according to English (at least American) typography, you
should use the quote marks of the language in the inner text (but not
surrounding the text. Like this:

(English in French in French)
Il disait: « Il faut mettre l'action en < fast forward >.»

But:

(Norwegian in English)
He said, "Trøndere gråter når «Vinsjan på kaia» blir deklamert."

On the other hand, some languages probably (I'm not sure which ones) say that
you should *always* use the quotation marks of the language of the "main text".
So I do agree with; all of this is potentially (inline) language dependant, but
this really dependends on the document language in the first place. An example:

In Norwegian you should use en-dashes in unordered lists (not available in
8859-1, so I use ordinary dashes here). So if I were writing a letter to a
friend, mentioning some of my computer applications, I would write a list like
this:

- AbiWord
- Mozilla Seamonkey
- etc.

Even though both AbiWord and Mozilla Seamonkey are English words, an en-dash
should be used. But some languages *may* say that the bullets should always
follow the "inline language" (I don't know of any such languages, but they may
exist). But if I write a separate chapter in my Norwegian book in English, only
English bullets should be used there. (Though I would perhaps write this a
separate document.) In cases like these, the best we can do, is offer sensible
defaults. (I.e., English document use English bullets.)

> I do not think that for instance MS Word uses document language.

No, and that's really to bad. (We have a choice of making a better WP than MS
Word, not duplicating its faults.)

> It has a default language, which is a language used when the user
> does not indicate otherwise, but that is quite different. In my view
> language should be a character property only.

I disagree, mainly because of sorting and various defaults. I don't have time
right now, but I'll look up some other uses for a document language.



This archive was generated by hypermail 2b25 : Sat Dec 30 2000 - 12:41:05 CST