RE: to support CJK(Chinese Japanese Korea)


Subject: RE: to support CJK(Chinese Japanese Korea)
From: Andrew Goh (andrew@ghimmoh.lugs.org.sg)
Date: Mon Feb 07 2000 - 04:45:57 CST


On Sun, 6 Feb 2000, Henrik Berg wrote:
>
> > These two work is very simple. Others such as spell check, print, import MS
> > Word should be modified also.
>
> Spell check is a problem. ispell (the base of the spell checker) is very 8-bit (even 7-bit) orientated. If anyone has an example of ispell used for 16bit/multibyte languages, it would be nice for me to get it to study how it's done.
>
hi,

just to cast a little light on 'spelling' and support requirements on
chinese characters:

each 2 byte chinese character normally represent 1 word (i.e. it is
not normally formed by stringing several characters together).

several 2 bytes chinese characters combine to form a phrase or sentence.
so potentially u'd have a grammer checker here instead of 'spell'
checker.

The hard part in adding chinese character support in abiword is
probably
a) to create a font set of some 5000-20000 chinese characters that
   is in common usage.
b) to add an input mechanism so that the user can 'type' in the
   characters. In present day systems, keyboard input normally takes
   the form of either
   1) sequences representing combination of strokes to 'draw' the
      chinese character.
   2) some kind of phonetic representation (e.g. han yu pin yin)
      so that the user can narrow down the scope of selecting
      characters from the 5000-1000 set. u can think of depth
      first traversal of a tree in which each node represents a
      'sound like' pronuncation.
c) to support unicode (2 byte) characters in all the critical functions
   (rendering, storage, printing, internal representation, etc.)

Up to this point, due in part to the above features of the chinese
character sets, spelling is not usually a problem.

The easy part is probably rendering the chinese characters.
i.e. chinese characters are all block (square) characters and
are normally equally spaced.

Just my 2 cents.

Perhaps someone here would like to shed some light on Japanese /
Korean character sets.

Hope this is interesting to non-chinese developers.

Cheers,
  Andrew



This archive was generated by hypermail 2b25 : Mon Feb 07 2000 - 04:49:01 CST