Re: May I take part in AbiWord CJKV develope? (fwd)


Subject: Re: May I take part in AbiWord CJKV develope? (fwd)
From: Chih-Wei Huang (cwhuang@linux.org.tw)
Date: Tue Oct 24 2000 - 22:45:46 CDT


Vlad Harchev ¼g¹D¡G
>
> > On Fri, 20 Oct 2000, Belcon wrote:
>
> > > > Also, as I remember, CJK "words" can be wrapped at any letter (so layout
> > > Sorry for my poor English,I can't understand what you said.What do you
> > > mean here?
> >
> > For Japaneese as I remember, rules fairly allow you to put as many letters
> > of that word as fit on current line, and remaining letters of that word start
> > on next line (i.e. rules allow you to break word at any character).
> > So, for example for monospaced fonts and English, if there are 4 cells left
> > on current line and the next word is "abiword", those 4 cells will remain
> > empty since "abiword" is 6 characters long, and next line will start with
> > "abiword". For japaneese and same situation, "abiw" would be placed on current
> > line and "ord" will go to next line.
> >
> > So, does CKV rules allow breaking words at any letter?

Yes!

> Thanks again!I think it is impossible to let application to know where
> it should
> wrap the Chinese word(include GB2312 and Big5) cause Chinese words'
> meaning is
> very complex.(But one point is important,we can't break any chinese
> character into
> two parts cause it is represented by two bytes)
> I am not sure of Japanese and Korean cause I am a Chinese.Anyone on
> this mailisting
> has idea about that?

Hello, Belcon, I think you misunderstood what Vlad meaned.
The issue Vlad mentioned is the line breaking problem.
General speaking, you're right, Vlad.
Line breaking algorithm for CJK is much easier than latins.
You can break a line between any two CJK characters.
Of course, there are some exceptions.
There are a few typesetting rules for Chinese.
For example, punctuations(U+3000 - U+303F)
can not be put in the beginning of a line.
However, the rules are not very rigorous.
You can just ignore them at this time.
 
> > > > Also, is there any sense of "case" for CJKV "characters" - i.e. are there
> > > > uppercased and lowercased versions of the same character? Probably some
> > > > AW logic could depend on this distinction.

No, CJK characters are not case-sensitive.

-- 
   ~     Chih-Wei Huang (cwhuang)
  'v'    E-Mail       : cwhuang@linux.org.tw
 // \\   CLDP Project : http://www.linux.org.tw/CLDP/ (Coordinator)
/(   )\  CLE  Project : http://cle.linux.org.tw/CLE/  (Developer)
 ^`~'^   HomePage     : http://www.cwhuang.idv.tw/

-- ~ Chih-Wei Huang (cwhuang) 'v' E-Mail : cwhuang@linux.org.tw // \\ CLDP Project : http://www.linux.org.tw/CLDP/ (Coordinator) /( )\ CLE Project : http://cle.linux.org.tw/CLE/ (Developer) ^`~'^ HomePage : http://www.cwhuang.idv.tw/



This archive was generated by hypermail 2b25 : Tue Oct 24 2000 - 22:47:03 CDT