Re: Unicode not enough?


Subject: Re: Unicode not enough?
From: Pierre Abbat (phma@oltronics.net)
Date: Tue Jun 05 2001 - 18:40:03 CDT


It's not just CJK. Unicode is unwieldy for processing the Nagari scripts. Each
Nagari script is allotted 128 codes, which are largely isomorphic in most of
them. The problem is that they are assigned neither by letter (of which there
are about 50) nor by glyph, but a haphazard mixture of both. There is no code
for "t"; you have to code "ta" and a virama, then when rendering the word print
"ta" and the virama if it falls at the end of the word, "t" by itself if it can
simply lose its danda when combining with the next consonant, or special glyphs
in some combinations such as "tt". Since alphabetizing is done on the letter
sequence, not the glyph sequence, it is unwieldy in Unicode. The proper way to
do it, since Unicode's stated aim is to encode letters, not glyphs, is to have
codes for a, aa, i, ... k, kh, g, gh, ng, ..., and no code for virama; but that
would make fonts hard to code, since they need no code for k but do need codes
for ksha, jnya, ji (in Gujarati), etc. To code all the glyphs would take 512
codes, probably even more in Sanskrit (which runs words together in writing
with the consonants at the end of one word joined to the beginning of the next).

phma



This archive was generated by hypermail 2b25 : Tue Jun 05 2001 - 19:09:42 CDT