Re: May I take part in AbiWord CJKV develope? (fwd)


Subject: Re: May I take part in AbiWord CJKV develope? (fwd)
From: hj (huangj@citiz.net)
Date: Mon Oct 23 2000 - 08:35:57 CDT


----- Original Message -----
发件人: Vlad Harchev <hvv@hippo.ru>
收件人: <abiword-dev@abisource.com>
发送时间: 2000年10月23日 16:10
主题: Re: May I take part in AbiWord CJKV develope? (fwd)

>
> ---------- Forwarded message ----------
> Date: Mon, 23 Oct 2000 10:02:30 +0800
> From: Belcon <rainfall@yeah.net>
> To: Vlad Harchev <hvv@hippo.ru>
> Subject: Re: May I take part in AbiWord CJKV develope? (fwd)
>
> Hello Vlad:
> >
> > Best regards,
> > -Vlad
> >
> > ---------- Forwarded message ----------
> > Date: Fri, 20 Oct 2000 12:48:54 +0500 (SAMST)
> > From: Vlad Harchev <hvv@hippo.ru>
> > To: Belcon <rainfall@yeah.net>
> > Subject: Re: May I take part in AbiWord CJKV develope?
> >
> > On Fri, 20 Oct 2000, Belcon wrote:
> >
> > Hello Belcon,
> >
> > > Dear Vlad:
> > >
> > > Vlad Harchev wrote
> > > >
> > > > On Thu, 19 Oct 2000, Belcon wrote:
> > > >
> > > > Hello,
> > > >
> > > > >
> > > > > Dear Vlad Harchev :
> > > > >
> > > > > It is said that you are looking for AbiWord CJKV developer.I am
not a
> > > > > expert,but I am glad to do something for this project.
> > > >
> > > > Very nice to hear that.
> > > >
> > > > > I know something about CJKV(GB2312,Big5,SJIS,JIS-EUC,KSC),and I
have
> > > > > spent one month on AbiWord's source code.But I know few about
> > > > > gtk+(I will learn it next several months).
> > > > > Would you please show me one way to this project?
> > > >
> > > > Of course, I will try to help you as much as I can.
> > > >
> > > > > BTW:Would you like to give me some information about AbiWord's
> > > > > processing.
> > > > > I know some by reading source code,but not all.As you
> > > > > know,PieceTable
> > > > > is easy to understand but the source code is not the same.
> > > > >
> > > > > Sorry for my bad English!
> > > >
> > > > It's nice IMO. I also sorry for my English too :)
> > > >
> > > > I should say that I'm not fluent in internals of AW at all - I
don't know
> > > > anything about PieceTable etc. And this knoweledge is not required.
> > > > I think you should look at my patch that added support for
non-latin1
> > > > singlebyte characters:
http://www.hippo.ru/~hvv/abiword/awrus-patch.gz
> > > > to see what changes were needed. Some portions of this patch affect
> > > > GUI-oriented things (improve dialogs layout for example) but they
are
> > > > not-language and encoding specific, so they effectively be ignored.
So, you
> > > > don't need to know gtk at all to add support for CJK to AW.
> > >
> > > I am glad to hear this.:-).But I still need learn something else.
> >
> > Moreover, this (or previous) week someone announced the existance of
patches
> > for support of chineese in AW! (Seems this is your 1st target).
> > Here is a location of the patch:
> > http://www.hj.webprovider.com/develope/index.html
> > I will forward that message with URL of screenshot to you too.
>
> Yes,I already know that place.I think this is a great work.
>
> >
> > > > The key component of that patch is XAP_EncodingManager class that
provides
> > > > information about encodings and allows to convert between native
locale's
> > > > encoding and Unicode. It's somewhat simple-minded since it assumes
single-byte
> > > > locales (or at least code using it assumes so). It supports
conversion of a
> > > > single character. That model is not sufficient for CJK texts since
they encode
> > > > each character using shift state or whatever - so methods for
converting the
> > > > to/from string (but by one symbol again, maintaining the shift
state) needs to
> > > > be implemented and used instead of present ones. Nearly all places
that call
> > > > XAP_EncodingManager.* should be modified to use the functions for
converting
> > > > to/from string maintaining state. Probably a new classes like
> > > > "XAP_EncodingManager::UToNativeConverter" and
> > > > "XAP_EncodingManager::NativeToUConverter" should be created that
will be a
> > > > wrapper around iconv() function and used instead of
XAP_EncodingManager's
> > > > methods. I will sketch these classes on Saturday.
> > > >
> > > > As for GUI - does GTK acceptably support CJK? (I.e do some generic
apps
> > > > support CJK)? For example, does gtk correctly draw CJK texts in
labels on
> > > > buttons?
> > >
> > > Yes,GTK support CJK.(As you know,some people of China like me are
> > > trying
> > > their best to localize AbiWord,and they make progress.AbiWord now can
> > > use
> > > Big5 Truetype Font very well.The only problem about GB2312 is that
> > > AbiWord
> > > doesn't show anything on screen if there is a GB2312 character.:-( .I
> > > guess
> > > that is something wrong with using GTK in a wrong way.)I never tried
> > > Japanese and Korean,but I found that GTK+-1.2.8 have files named
> > > "gtkrc.ko"
> > > and "gtkrc.ja".Since GB2312 and Big5 work well,I guess KSC5601 and
> > > JISX0208
> > > work well too.:-)
> >
> > Nice to hear that. As I understand, there exist some patches to AW to
make it
> > partially support CJKV?
> > As for reason of not showing GB2312 - non-patched AW works this way,
not GTK
> > :(
> >
>
> A developt of Taiwan CLE group has made a patch support GB2312 and
> Big5.But
> there is very big problem just like what I said above.I don't know why
> AbiWord
> can't show GB2312(but it can print GB2312 properly) while Big5 is fine
> after
> patched.

My Chinese patch is for GB2312. It can display and print GB2312. I think it
can display and print Korean or Japanese if you modify fonts.hj.

> I am working on this problem now.And I think if this problem
> solved,AbiWord
> support CJKV is simple IMO.
>
>
> > > >
> > > > What is your favourite development platform? Mine is linux, and I
hope yours
> > > > is some unix too. This is somewhat important since my patch adds
full
> > > > support for non-latin1 only for unix platform (though a little
efforts are
> > > > required to add support for other platforms AW supports).
> > > >
> > >
> > > My favourite development platform is linux just like you.I really
love
> > > this platform although I used it not very long times ago.
> >
> > Very nice to hear that.
> >
> > > > I think it would be nice if our conversation was put in
downloadable form on
> > > > the Web, since it can contain "wise thoughts" that would be useful
for
> > > > developers that would decide to add CJK support to other apps.
Wouldn't you
> > > > mind for our communications become publically available at some
point?
> > > >
> > > Sure,it is really a good idea.I hope many people can join this
> > > project.
> >
> > I plan to forward all messages from our conversation to AW mailing list
> > first. I hope you won't mind this too.
> >
> > > > Are there CJKV versions of Microsoft Word exist? If they don't
exist, there
> > > > won't be need to bother with exporting and importing CJKV documents.
If they
> > > > exist, could you send some very small rtf document in CKJ? Are
> > >
> > > Yes,MS Word have CJKV version.I must admit that it supports CJKV very
> > > well
> > > although it is painful to wait for MS Word's starting up.
> > > Cause I only have MS word's GB2312 version,I have no way to give you
> > > other
> > > CJKV documents except GB2312.I have no place to up load this
document,so
> > > I
> > > have to attach it to this letter.Luckily,it is not very big.
> >
> > Yes, it's very small. Thanks for it.
> > You told about GB2312 version of MSWord - are there other versions of
MSWord
> > that support other CJKV encodings? If yes, can they read correctly files
>
> As I know,yes.
>
> > produced by other MSWords with support for different encodings?
> >
>
> No,we always use the third application to show Big5 MSWord document
> while using
> MSWord GB2312 version.
>
> > > > CJK-equivalents of Type1 fonts corresponding to Helvetica, Times New
Roman and
> > > > others supplied with AW exist (you can look at ghostscript - that's
the
> > > > piece of software that most probably will support them)? How much
glyphs
> > > > (characters or hieroglyphs) are in CJKV fonts? Does Ghostscript work
OK with
> > > > CJKV at all? If yes, could you please put some very small document
in PS
> > > > format for download (and compress it)? Does TeX typesetting system
works with
> > > > CJK at all (there is latex exported in AW).
> > > >
> > >
> > > As I know,ghostscript 6.01 support CJKV Truetype Font(Here maybe
have
> > > something
> > > useful to you. http://www.aihara.co.jp/~taiji/tops/ ) But its license
is
> > > not GPL.
> > > We have to give it up.:-( .But gs5.05 only support Type1 Font.We have
to
> > > generate
> > > these fonts from TTF,that make gs5.05 package very big.
> > > Most CJKV fonts are Truetype Fonts.I think it is a easy way to use
TTF
> > > IMO.
> > > Of course,we can generate Type1 fonts from TTF.
> >
> > It's nice that gs supports CJKV in some form (for some fonts).
> >
> > > According to some resources,GB2312-80 has 7,445 characters,Big5 have
> > > 13,494
> > > characters,JISX 0208-1990 has 6,879 characters,KSC 5601-1992 has 8,224
> > > characters.
> > > These are all basic character Set.Their superset have more
> > > characters.GB2312's
> > > superset,GBK has 21,886 characters.Sounds like a nightmare?
> >
> > I'm glad I don't have to study CJKV languages :)
> > As long as all supersets of CJKV fit are represented as UCS-2, it's OK
:)
> > Even if they don't - that's not fatal.
> >
> > > Two months ago,I make gs5.05 can show CJKV Font.But it need Type1
font
> > > support.
> > > So,even I send ps document to you,you still can't read it.
> >
> > I just wanted to look what PS prolog is emitted to allow CJKV fonts in
ps
> > files. So, if you can generate small .ps file not containing fonts
itself, I
> > would be willing to see it (not in gv, but just raw .ps).
> >
> I am not sure about your meaning.But I can give you my test ps
> file.(Sorry for
> I am lazying to make Big5,JISX,KSC nomalized like GB2312.)
>
> > > I am sorry that I know nothing about Tex.Anyway,I am a newbie to
> > > linux.
> >
> > It's typesetting system used by scientists (source file extensions are
.tex,
> > .latex, and output format is .dvi, that is converted to .ps using
'dvips'
> > utility). Chances are very low that it supports CJKV..
> >
> Thank you so much!
>
> > > > The biggiest problem would be printing - the generation of PS
prologue that
> > > > will make fonts working in GS.
> > > >
> > > Yes,we need GS support.
> > >
> > > > Also, are there X font servers that can display CJKV Type1 fonts
(most
> > > > probably, yes)?
> > >
> > > Yes,there are many X font servers that can display CJKV fonts.
> >
> > Can they display Type1 fonts too?
> >
> Sure,they can!
>
> > > >
> > > > Also, I have an impression that support for Vietnameese is already
available
> > > > in AW due to my non-latin1 singlebyte characters megapatch since all
> > > > vietnameese characters can be represented as one byte.
> > > >
> > > > Also, as I remember, CJK "words" can be wrapped at any letter (so
layout
> > >
> > > Sorry for my poor English,I can't understand what you said.What do
you
> > > mean here?
> >
> > For english, if there is no space at the end of current line for some
> > word, that line became "finished" and next line starts with that word.
Of
> > course, if hyphenation is disabled.
> > For Japaneese as I remember, rules fairly allow you to put as many
letters
> > of that word as fit on current line, and remaining letters of that word
start
> > on next line (i.e. rules allow you to break word at any character).
> > So, for example for monospaced fonts and English, if there are 4 cells
left
> > on current line and the next word is "abiword", those 4 cells will
remain
> > empty since "abiword" is 6 characters long, and next line will start
with
> > "abiword". For japaneese and same situation, "abiw" would be placed on
current
> > line and "ord" will go to next line.
> >
> > So, does CKV rules allow breaking words at any letter?
> >
> Thanks again!I think it is impossible to let application to know where
> it should
> wrap the Chinese word(include GB2312 and Big5) cause Chinese words'
> meaning is
> very complex.(But one point is important,we can't break any chinese
> character into
> two parts cause it is represented by two bytes)
> I am not sure of Japanese and Korean cause I am a Chinese.Anyone on
> this mailisting
> has idea about that?
>
> > > > logic should respect this). Am I right? What other languages from
CJKV allow
> > > > wrapping? I think core AW developers will make that change theirself
if we ask
> > > > nicely, so there is no need to study AW layout internals at all
IMO - others
> > > > will fix the logic for us.
> > > >
> > > > Also, is there any sense of "case" for CJKV "characters" - i.e. are
there
> > > > uppercased and lowercased versions of the same character? Probably
some
> > > > AW logic could depend on this distinction.
> > >
> > > As I know,there is no sense of case for CJKV characters.But there
> > > maybe is
> > > same character in different place of characterset.Not many,just few.
> >
> > OK.
> >
> > > >
> > > > So, could you please answer these questions.
> > > > Thanks for your efforts.
> > > >
> > >
> > > It is my pleasure.
> > >
> > > Cheers!
> > > > P.S.: You could simply call me Vlad, without surname.
> >
> > Thank you for your answers.
> >
> > Best regards,
> > -Vlad
> Cheers
>
>



This archive was generated by hypermail 2b25 : Mon Oct 23 2000 - 08:38:48 CDT