Re: AbiWord.Unicode.clue++

Drazen Kacar (dave@srce.hr)
Sat, 20 Mar 1999 12:13:15 +0100


Eric W. Sink wrote:
> AbiWord.Unicode.clue++
[...]

OK, here are a few thoughts. Comments about current implementation are
based on the behaviour of AbiWord 0.5.2 built on Solaris 7 using Sun's
X server and Sun's Type 5 keyboard. If you thought Unicode is bad,
wait until you come to the wonderfull world of X keyboard model. :-)

As far as I know, there are two worlds with different needs for I18N.
What works well for one of them is usually considered unusable or
strange by the other. On one side is CJK world with its multibyte
encodings and on the other is ISO 8859-x world with a single byte
encoding. If you make an application which works perfectly well
with Latin 1 encodings, it probably won't work good enough in other
Latin x environments, so you can't rely on Latin 1 testers for the
single byte world.

I can test, advise etc. for the Latin 2 encoding (getting that right
would probably mean you'll have it right for the rest of ISO 8859-x
encodings in languages written from left to right). I don't know
much about multibyte environments, so I won't talk about that.

First of all, there is a discussion on Mozilla I18N newsgroup/mailing
list which you'd probably want to read. Thread's subject is
"PostScript Text rendering..." None of the versions of Netscape's
browser released so far was able to correctly print HTML page in
Latin 2. The browser was always generating Latin 1 PostScript. I think
they are finally trying to get that right, so it might be worthwile
to read it.

Currently the biggest problem is a lack of code page selector in
the font dialog. If I have Latin 1 and Latin 2 Helvetica, I can't
use both of them. There should be a menu for this selection.
Don't put various iso-8859-x selections in it if you can avoid it,
since people usually don't remember that numbers. Use something
like "Latin 1", "Latin 2", "Cyrillic", "Greek", etc. I know that
Cyrillic is ISO 8859-5, but I really can't remember which code
page contains Greek characters, so I'd have to look it up somewhere
or go by the trial and error procedure.

When the program loads a file from the disk and it doesn't have
fonts which are specified in the file, don't just blindly use
your fallback font. If the document was written in Latin 2, I really
don't want to see it in Latin 1 Times. Any Latin 2 font would be
better than that.

Current save format for non-ASCII characters is &#x<hex code>. This is
not desirable. If the document is written in a single code page, it
would be much nicer if that code page was used in the save file, so
I could use all Unix text tools on it. It's very nice that the
document format is humen readable XML, but if you put multicharacter
things in there, it isn't very usable any more. If the document uses
several code pages, I suppose UTF-8 would be the right thing.
Except in one case. If we have a document with German and Polish
text, it colud happen that German was written using Latin 1 and
Polish is in Latin 2. However, German characters are included in
Latin 2 (at the same positions), as well as several other code
pages, so you'd still be able to save it in a single byte encoding.

Then, there is a localization issue. If I want to translate menu
entries to Chech, I'd have to use Latin 2 for that, so your localization
code must provide a way to specify code page. More than that, you'd
have to provide a space for several font names to be tried. GTK uses
Helvetica as a default font. Solaris, for example, doesn't come with
Latin 2 Helvetica, but there is Arial. This could be changed in gtkrc
file, but you can't rely on that. Additional problem is that GTK doesn't
know about X resources and information in gtkrc might not be correct
if the application uses non-local display.

Your keyboard shortcut for Exit in the file menu is not implemented
correctly. The menu says that the shortcut is Alt+F4 (BTW, I'd be
happier if this was Alt+Q), but when I press ALT+F4 nothing happens.
Because the shortcut is actually Mod1+F4. On my keyboard ALt key
has Alt_L KeySym and Mod4 modifier. There are two Meta keys with
Meta_L & Meta_R KeySyms and Mod1, so this shortcut actually works
with any of them. It's kind of confusing for an average user.
Netscape's shortcuts work with both Alt and Meta. Since Motif allows
menu entries to have only one shortcut, Netscape uses translations
instead of shortcuts. If you don't intend to use Alt and Meta
combinations for different things, think about this.

There are some things which won't work because Unix localization
model is broken, but you can't do anything about that.

Ah, yes... Importers for various file formats with Windows origin
would have to do some conversions. Microsoft uses Windows-1252
code page for the Latin 1 world. That one is a superset of Latin 1
code page, with the additional characters in positions from 128
to 159. Instead of Latin 2 they use abomination called Windows-1250.
This is *not* a superset of ISO Latin 2. The only design goal
was to have copyright and trademark characters at the same positions
as in Windows-1252, so they mixed up some characters. I think
that the same thing happened with other single byte code pages,
but I'm not sure.

-- 
 .-.   .-.    Life is a sexually transmitted disease.
(_  \ /  _)
     |        dave@srce.hr
     |        dave@fly.cc.fer.hr


This archive was generated by hypermail 1.03b2.