Re: i18n of abiword -- combining characters


Subject: Re: i18n of abiword -- combining characters
From: Leonard Rosenthol (leonardr@lazerware.com)
Date: Fri Jan 14 2000 - 17:43:42 CST


At 1:14 PM -0800 1/14/00, Paul Rohr wrote:
>Thai, like some other languages, allows a sequence of individual characters
>to be typed to form a single glyph.

        Arabic, Korean and Yiddish are some others.

>1. Character sequence normalization. (reasonable)
>---------------------------------------------------
>Thus, there needs to be work done (probably at input time) to normalize
>those sequences of combining characters, and perhaps ignore invalid ones.

        If you use the standard OS input methods, they will handle
all this for you - in fact, they will also handle a number of other
input issues that are pretty complex for some languages (especially
CJK).

>(Otherwise, the variant sequences will make features like spell-check
>prohibitively unreliable.)

        And also search & replace. The whole "combined characters"
in Unicode issue is an interesting one, especially when doing things
like regular expression searches.

>2. Combining characters -- position. (???)
>--------------------------------------------
>The current code assumes that every Unicode character will occupy one cell
>of display space of a known width. However, languages like Thai render
>sequences of several characters into the same display cell.

        Since Unicode only has a single code point for any valid
glyph, your input handler should be converting the multiple
characters into the new composite glyph value and then you only have
one character to display.

        HOWEVER, when doing searches, spell checking, etc. you will
need to check if a given character is a composite character and if
so, break it up for parsing.

>4. Combining characters -- rendering. (???, platform-specific)
>----------------------------------------------------------------
>On each platform, someone will need to investigate whether the
>text-rendering primitives know how to properly combine a character sequence
>into a single glyph. If so, drawing should be pretty easy. If not, adding
>logic to do all that rendering from the constituent glyphs in the font may
>be difficult.
>
        Again, if you use the single combined glyph code point, it
should work just fine when rendered.

Leonard
----------------------------------------------------------------------------
                   You've got a SmartFriend in Pennsylvania
----------------------------------------------------------------------------
Leonard Rosenthol Internet: leonardr@lazerware.com
                                        America Online: MACgician
Web Site: <http://www.lazerware.com/>
FTP Site: <ftp://ftp.lazerware.com/>
PGP Fingerprint: C76E 0497 C459 182D 0C6B AB6B CA10 B4DF 8067 5E65



This archive was generated by hypermail 2b25 : Fri Jan 14 2000 - 17:51:41 CST