Re: May I take part in AbiWord CJKV develope? (fwd)


Subject: Re: May I take part in AbiWord CJKV develope? (fwd)
From: Vlad Harchev (hvv@hippo.ru)
Date: Sun Oct 22 2000 - 04:30:28 CDT


 Best regards,
  -Vlad

---------- Forwarded message ----------
Date: Fri, 20 Oct 2000 12:48:54 +0500 (SAMST)
From: Vlad Harchev <hvv@hippo.ru>
To: Belcon <rainfall@yeah.net>
Subject: Re: May I take part in AbiWord CJKV develope?

On Fri, 20 Oct 2000, Belcon wrote:

 Hello Belcon,

> Dear Vlad:
>
> Vlad Harchev wrote
> >
> > On Thu, 19 Oct 2000, Belcon wrote:
> >
> > Hello,
> >
> > >
> > > Dear Vlad Harchev :
> > >
> > > It is said that you are looking for AbiWord CJKV developer.I am not a
> > > expert,but I am glad to do something for this project.
> >
> > Very nice to hear that.
> >
> > > I know something about CJKV(GB2312,Big5,SJIS,JIS-EUC,KSC),and I have
> > > spent one month on AbiWord's source code.But I know few about
> > > gtk+(I will learn it next several months).
> > > Would you please show me one way to this project?
> >
> > Of course, I will try to help you as much as I can.
> >
> > > BTW:Would you like to give me some information about AbiWord's
> > > processing.
> > > I know some by reading source code,but not all.As you
> > > know,PieceTable
> > > is easy to understand but the source code is not the same.
> > >
> > > Sorry for my bad English!
> >
> > It's nice IMO. I also sorry for my English too :)
> >
> > I should say that I'm not fluent in internals of AW at all - I don't know
> > anything about PieceTable etc. And this knoweledge is not required.
> > I think you should look at my patch that added support for non-latin1
> > singlebyte characters: http://www.hippo.ru/~hvv/abiword/awrus-patch.gz
> > to see what changes were needed. Some portions of this patch affect
> > GUI-oriented things (improve dialogs layout for example) but they are
> > not-language and encoding specific, so they effectively be ignored. So, you
> > don't need to know gtk at all to add support for CJK to AW.
>
> I am glad to hear this.:-).But I still need learn something else.

 Moreover, this (or previous) week someone announced the existance of patches
 for support of chineese in AW! (Seems this is your 1st target).
 Here is a location of the patch:
        http://www.hj.webprovider.com/develope/index.html
 I will forward that message with URL of screenshot to you too.

> > The key component of that patch is XAP_EncodingManager class that provides
> > information about encodings and allows to convert between native locale's
> > encoding and Unicode. It's somewhat simple-minded since it assumes single-byte
> > locales (or at least code using it assumes so). It supports conversion of a
> > single character. That model is not sufficient for CJK texts since they encode
> > each character using shift state or whatever - so methods for converting the
> > to/from string (but by one symbol again, maintaining the shift state) needs to
> > be implemented and used instead of present ones. Nearly all places that call
> > XAP_EncodingManager.* should be modified to use the functions for converting
> > to/from string maintaining state. Probably a new classes like
> > "XAP_EncodingManager::UToNativeConverter" and
> > "XAP_EncodingManager::NativeToUConverter" should be created that will be a
> > wrapper around iconv() function and used instead of XAP_EncodingManager's
> > methods. I will sketch these classes on Saturday.
> >
> > As for GUI - does GTK acceptably support CJK? (I.e do some generic apps
> > support CJK)? For example, does gtk correctly draw CJK texts in labels on
> > buttons?
>
> Yes,GTK support CJK.(As you know,some people of China like me are
> trying
> their best to localize AbiWord,and they make progress.AbiWord now can
> use
> Big5 Truetype Font very well.The only problem about GB2312 is that
> AbiWord
> doesn't show anything on screen if there is a GB2312 character.:-( .I
> guess
> that is something wrong with using GTK in a wrong way.)I never tried
> Japanese and Korean,but I found that GTK+-1.2.8 have files named
> "gtkrc.ko"
> and "gtkrc.ja".Since GB2312 and Big5 work well,I guess KSC5601 and
> JISX0208
> work well too.:-)

 Nice to hear that. As I understand, there exist some patches to AW to make it
partially support CJKV?
 As for reason of not showing GB2312 - non-patched AW works this way, not GTK
:(
 
> >
> > What is your favourite development platform? Mine is linux, and I hope yours
> > is some unix too. This is somewhat important since my patch adds full
> > support for non-latin1 only for unix platform (though a little efforts are
> > required to add support for other platforms AW supports).
> >
>
> My favourite development platform is linux just like you.I really love
> this platform although I used it not very long times ago.

 Very nice to hear that.
 
> > I think it would be nice if our conversation was put in downloadable form on
> > the Web, since it can contain "wise thoughts" that would be useful for
> > developers that would decide to add CJK support to other apps. Wouldn't you
> > mind for our communications become publically available at some point?
> >
> Sure,it is really a good idea.I hope many people can join this
> project.

 I plan to forward all messages from our conversation to AW mailing list
first. I hope you won't mind this too.

> > Are there CJKV versions of Microsoft Word exist? If they don't exist, there
> > won't be need to bother with exporting and importing CJKV documents. If they
> > exist, could you send some very small rtf document in CKJ? Are
>
> Yes,MS Word have CJKV version.I must admit that it supports CJKV very
> well
> although it is painful to wait for MS Word's starting up.
> Cause I only have MS word's GB2312 version,I have no way to give you
> other
> CJKV documents except GB2312.I have no place to up load this document,so
> I
> have to attach it to this letter.Luckily,it is not very big.

 Yes, it's very small. Thanks for it.
 You told about GB2312 version of MSWord - are there other versions of MSWord
that support other CJKV encodings? If yes, can they read correctly files
produced by other MSWords with support for different encodings?

> > CJK-equivalents of Type1 fonts corresponding to Helvetica, Times New Roman and
> > others supplied with AW exist (you can look at ghostscript - that's the
> > piece of software that most probably will support them)? How much glyphs
> > (characters or hieroglyphs) are in CJKV fonts? Does Ghostscript work OK with
> > CJKV at all? If yes, could you please put some very small document in PS
> > format for download (and compress it)? Does TeX typesetting system works with
> > CJK at all (there is latex exported in AW).
> >
>
> As I know,ghostscript 6.01 support CJKV Truetype Font(Here maybe have
> something
> useful to you. http://www.aihara.co.jp/~taiji/tops/ ) But its license is
> not GPL.
> We have to give it up.:-( .But gs5.05 only support Type1 Font.We have to
> generate
> these fonts from TTF,that make gs5.05 package very big.
> Most CJKV fonts are Truetype Fonts.I think it is a easy way to use TTF
> IMO.
> Of course,we can generate Type1 fonts from TTF.

  It's nice that gs supports CJKV in some form (for some fonts).

> According to some resources,GB2312-80 has 7,445 characters,Big5 have
> 13,494
> characters,JISX 0208-1990 has 6,879 characters,KSC 5601-1992 has 8,224
> characters.
> These are all basic character Set.Their superset have more
> characters.GB2312's
> superset,GBK has 21,886 characters.Sounds like a nightmare?

  I'm glad I don't have to study CJKV languages :)
  As long as all supersets of CJKV fit are represented as UCS-2, it's OK :)
  Even if they don't - that's not fatal.

> Two months ago,I make gs5.05 can show CJKV Font.But it need Type1 font
> support.
> So,even I send ps document to you,you still can't read it.

  I just wanted to look what PS prolog is emitted to allow CJKV fonts in ps
files. So, if you can generate small .ps file not containing fonts itself, I
would be willing to see it (not in gv, but just raw .ps).

> I am sorry that I know nothing about Tex.Anyway,I am a newbie to
> linux.

 It's typesetting system used by scientists (source file extensions are .tex,
.latex, and output format is .dvi, that is converted to .ps using 'dvips'
utility). Chances are very low that it supports CJKV..

> > The biggiest problem would be printing - the generation of PS prologue that
> > will make fonts working in GS.
> >
> Yes,we need GS support.
>
> > Also, are there X font servers that can display CJKV Type1 fonts (most
> > probably, yes)?
>
> Yes,there are many X font servers that can display CJKV fonts.

  Can they display Type1 fonts too?

> >
> > Also, I have an impression that support for Vietnameese is already available
> > in AW due to my non-latin1 singlebyte characters megapatch since all
> > vietnameese characters can be represented as one byte.
> >
> > Also, as I remember, CJK "words" can be wrapped at any letter (so layout
>
> Sorry for my poor English,I can't understand what you said.What do you
> mean here?

 For english, if there is no space at the end of current line for some
word, that line became "finished" and next line starts with that word. Of
course, if hyphenation is disabled.
 For Japaneese as I remember, rules fairly allow you to put as many letters
of that word as fit on current line, and remaining letters of that word start
on next line (i.e. rules allow you to break word at any character).
 So, for example for monospaced fonts and English, if there are 4 cells left
on current line and the next word is "abiword", those 4 cells will remain
empty since "abiword" is 6 characters long, and next line will start with
"abiword". For japaneese and same situation, "abiw" would be placed on current
line and "ord" will go to next line.

 So, does CKV rules allow breaking words at any letter?

> > logic should respect this). Am I right? What other languages from CJKV allow
> > wrapping? I think core AW developers will make that change theirself if we ask
> > nicely, so there is no need to study AW layout internals at all IMO - others
> > will fix the logic for us.
> >
> > Also, is there any sense of "case" for CJKV "characters" - i.e. are there
> > uppercased and lowercased versions of the same character? Probably some
> > AW logic could depend on this distinction.
>
> As I know,there is no sense of case for CJKV characters.But there
> maybe is
> same character in different place of characterset.Not many,just few.

 OK.
 
> >
> > So, could you please answer these questions.
> > Thanks for your efforts.
> >
>
> It is my pleasure.
>
> Cheers!
> > P.S.: You could simply call me Vlad, without surname.

 Thank you for your answers.

 Best regards,
  -Vlad



This archive was generated by hypermail 2b25 : Sun Oct 22 2000 - 04:58:04 CDT