Re: i18n of abiword -- charset mapping


Subject: Re: i18n of abiword -- charset mapping
From: Paul Rohr (paul@abisource.com)
Date: Fri Jan 14 2000 - 15:13:35 CST


These problems will need to be addressed for any language which doesn't use
the Latin-1 charset.

1. Charset mapping on input. (easy)
------------------------------------
As you say, doing the math to map Thai keycodes (TIS-620) to their Unicode
equivalents should be quite easy. If you find that the current mechanism
for this purpose is insufficient, let us know. We're always interested in
better approaches.

2. Charset mapping at rendering time. (reasonable)
----------------------------------------------------
Depending on the font being used for rendering, there may need to be a
mapping back from Unicode characters to another charset. We currently don't
have a general mechanism in the code for this, and one will be needed for
most non-Latin languages.

The easiest way for users to work around this problem is to locate a font
which stores the characters for their language of choice at the appropriate
Unicode positions. Note that this does *not* need to be a complete Unicode
font (as those are still fairly rare).

This suggests that it might be fairly simple to take existing *fonts* which
are commonly used for a given language, and run them through a conversion
process which re-encodes just those characters as a Unicode font instead. I
don't know if there are any existing font-conversion utilities for this
purpose, but if not, it'd make a great project for someone. :-)

general hint
------------
Insofar as we're likely to need a bunch of to/from conversions between
Unicode and various charsets or codepages, it seems like we may need a new
set of efficient mapping classes which can be used for both purposes. As a
quick-and-easy XP solution, the usual table-driven macro magic with header
files ought to work nicely here, so long as we're willing to compile the
tables into the code.

Since these tables can be quite sizeable, it's probably worth debating the
proposed approach and APIs on this list before implementing them.

In any event, the following seems to be *the* definitive source of raw
material for this purpose:

  ftp://ftp.unicode.org/Public/MAPPINGS/

For extra credit, it might be interesting to figure out how to change the
mapping classes to demand-load the table contents at runtime from resources.
Then all we'd need to do is figure out how to build the necessary
platform-specific resources from these raw tables at build time.

Paul



This archive was generated by hypermail 2b25 : Fri Jan 14 2000 - 15:08:16 CST