Re: Non-latin encoding and languages


Subject: Re: Non-latin encoding and languages
sterwill@abisource.com
Date: Tue Feb 29 2000 - 09:29:27 CST


Vadim Frolov wrote:
> I have little experience with Abi Word and I can be wrong, but I see
> a potential problem for non-latin (in my case - cyrillic) encoding.
> Current encoding model of Abi Word uses 1 encoding scheme for one l
> language,
> but, for example, for russian (cyrillic) language:
> - unix platforms uses KOI8-R code page
> - ms windows - 1251 code page
> - DOS and OS/2 - CP 866.
> When I try to open MS Word *.doc files in Abi Word under X11R6 (FreeBSD
> 3.4), I
> see the abracadabra... (in current version 0.7.8 AbiWord not convert
> CP1251 to KOI8-R)
> May be it's a problem for cyrillic language _only_ but I'm not sure....

Yes, this does sound like a problem. I have little experience with
problems like these, but they'll need to be solved to make AbiWord
usable to those outside of the Latin-1 set of languages.

As usual, there seem to be two problems (and I'll try to keep them
short, so people more qualified than me can jump in). The first
is internationalization of the document, so that users can use their
native character sets to write documents. The encoding issues here
are specific to the fonts the user is using, and should map into
Unicode space if we do our job correctly. We haven't done much
work on this front for non-Latin-1 language, so I wouldn't be surprised
if we have no fonts that can handle KOI8-R encodings.

Actually, problem 1.5 is the importers and exporters, which will need
to do things like character mapping conversions (like you mention).

The second problem, which I just realized AbiWord has, is that localizations
need encodings too. All our current menu, toolbar, and string sets
map into Latin-1 space, but for locales with more than one encoding,
we may have to figure out a way to represent different them all.

-- 
Shaw Terwilliger



This archive was generated by hypermail 2b25 : Tue Feb 29 2000 - 09:29:28 CST