Re: i18n of abiword -- combining characters


Subject: Re: i18n of abiword -- combining characters
From: Pierre Abbat (phma@oltronics.net)
Date: Wed Jan 19 2000 - 00:39:22 CST


On further thought, maybe Bison is overkill for the job. I think the way to do
it is to have a list of rules, and run down the rules and apply each to the
whole string. Each rule, except rules R and I, takes a few characters (usually 2
or 3) and turns them into other characters. Here's a rough outline:

1. Look for special ligatures, such as T+T (which would otherwise be handled by
removing the danda) or TTH+TTH (which would otherwise be done with a virama,
since TTHA is adandic).

2. If R appears after a consonant, turn it into a slash. (KR, DR, and a few
other adandic combinations have to be handled specially.)

3. If any adandic consonant is followed immediately by a consonant, change the
first consonant to its full syllable and add a virama. (E.g. U D CH E -> U DA \
CH E. This sort of thing rarely happens, as adandic combinations that actually
occur have ligatures.)

4. If any consonant (dandic or not) falls at the end of a word, change it to
its syllable and add a virama.

5. If any vowel is preceded by a consonant, change the consonant to its
syllable and the vowel to its combining form (or delete it if it is A). The
combining form is two characters if it is written both before and after the
consonant (this happens in Gurmukhi).

6. If the combining vowel form is i (or whichever goes before the consonant,
depending on the language), move it back one character, then keep moving it
until it is about to pass anything that isn't a consonant or consonant cluster.

7. If R precedes a consonant, but the previous letter is not a consonant, move
it forward until it passes one vowel (as part of a syllable), turn it into a
candra, and leave it there.

This whole set of rules can be easily implemented on a Turing machine that can
stretch and shrink its tape. In actuality, I think I'll use two buffers, one to
hold the beginning of the string and the other to hold the end.

Pruet, and anyone else who knows how to write any of these Nagari-derived
scripts, please tell me if this will be easy to adapt to your language.

phma



This archive was generated by hypermail 2b25 : Wed Jan 19 2000 - 07:49:35 CST