Re: May I take part in AbiWord CJKV develope? (fwd)


Subject: Re: May I take part in AbiWord CJKV develope? (fwd)
From: Vlad Harchev (hvv@hippo.ru)
Date: Sun Oct 22 2000 - 04:29:54 CDT


---------- Forwarded message ----------
Date: Fri, 20 Oct 2000 10:52:38 +0800
From: Belcon <rainfall@yeah.net>
To: Vlad Harchev <hvv@hippo.ru>
Subject: Re: May I take part in AbiWord CJKV develope?

Dear Vlad:

Vlad Harchev wrote
>
> On Thu, 19 Oct 2000, Belcon wrote:
>
> Hello,
>
> >
> > Dear Vlad Harchev :
> >
> > It is said that you are looking for AbiWord CJKV developer.I am not a
> > expert,but I am glad to do something for this project.
>
> Very nice to hear that.
>
> > I know something about CJKV(GB2312,Big5,SJIS,JIS-EUC,KSC),and I have
> > spent one month on AbiWord's source code.But I know few about
> > gtk+(I will learn it next several months).
> > Would you please show me one way to this project?
>
> Of course, I will try to help you as much as I can.
>
> > BTW:Would you like to give me some information about AbiWord's
> > processing.
> > I know some by reading source code,but not all.As you
> > know,PieceTable
> > is easy to understand but the source code is not the same.
> >
> > Sorry for my bad English!
>
> It's nice IMO. I also sorry for my English too :)
>
> I should say that I'm not fluent in internals of AW at all - I don't know
> anything about PieceTable etc. And this knoweledge is not required.
> I think you should look at my patch that added support for non-latin1
> singlebyte characters: http://www.hippo.ru/~hvv/abiword/awrus-patch.gz
> to see what changes were needed. Some portions of this patch affect
> GUI-oriented things (improve dialogs layout for example) but they are
> not-language and encoding specific, so they effectively be ignored. So, you
> don't need to know gtk at all to add support for CJK to AW.

  I am glad to hear this.:-).But I still need learn something else.

> The key component of that patch is XAP_EncodingManager class that provides
> information about encodings and allows to convert between native locale's
> encoding and Unicode. It's somewhat simple-minded since it assumes single-byte
> locales (or at least code using it assumes so). It supports conversion of a
> single character. That model is not sufficient for CJK texts since they encode
> each character using shift state or whatever - so methods for converting the
> to/from string (but by one symbol again, maintaining the shift state) needs to
> be implemented and used instead of present ones. Nearly all places that call
> XAP_EncodingManager.* should be modified to use the functions for converting
> to/from string maintaining state. Probably a new classes like
> "XAP_EncodingManager::UToNativeConverter" and
> "XAP_EncodingManager::NativeToUConverter" should be created that will be a
> wrapper around iconv() function and used instead of XAP_EncodingManager's
> methods. I will sketch these classes on Saturday.
>
> As for GUI - does GTK acceptably support CJK? (I.e do some generic apps
> support CJK)? For example, does gtk correctly draw CJK texts in labels on
> buttons?

  Yes,GTK support CJK.(As you know,some people of China like me are
trying
their best to localize AbiWord,and they make progress.AbiWord now can
use
Big5 Truetype Font very well.The only problem about GB2312 is that
AbiWord
doesn't show anything on screen if there is a GB2312 character.:-( .I
guess
that is something wrong with using GTK in a wrong way.)I never tried
Japanese and Korean,but I found that GTK+-1.2.8 have files named
"gtkrc.ko"
and "gtkrc.ja".Since GB2312 and Big5 work well,I guess KSC5601 and
JISX0208
work well too.:-)

>
> What is your favourite development platform? Mine is linux, and I hope yours
> is some unix too. This is somewhat important since my patch adds full
> support for non-latin1 only for unix platform (though a little efforts are
> required to add support for other platforms AW supports).
>

  My favourite development platform is linux just like you.I really love
this platform although I used it not very long times ago.

> I think it would be nice if our conversation was put in downloadable form on
> the Web, since it can contain "wise thoughts" that would be useful for
> developers that would decide to add CJK support to other apps. Wouldn't you
> mind for our communications become publically available at some point?
>
  Sure,it is really a good idea.I hope many people can join this
project.

> Are there CJKV versions of Microsoft Word exist? If they don't exist, there
> won't be need to bother with exporting and importing CJKV documents. If they
> exist, could you send some very small rtf document in CKJ? Are
 
 Yes,MS Word have CJKV version.I must admit that it supports CJKV very
well
although it is painful to wait for MS Word's starting up.
  Cause I only have MS word's GB2312 version,I have no way to give you
other
CJKV documents except GB2312.I have no place to up load this document,so
I
have to attach it to this letter.Luckily,it is not very big.

> CJK-equivalents of Type1 fonts corresponding to Helvetica, Times New Roman and
> others supplied with AW exist (you can look at ghostscript - that's the
> piece of software that most probably will support them)? How much glyphs
> (characters or hieroglyphs) are in CJKV fonts? Does Ghostscript work OK with
> CJKV at all? If yes, could you please put some very small document in PS
> format for download (and compress it)? Does TeX typesetting system works with
> CJK at all (there is latex exported in AW).
>

  As I know,ghostscript 6.01 support CJKV Truetype Font(Here maybe have
something
useful to you. http://www.aihara.co.jp/~taiji/tops/ ) But its license is
not GPL.
We have to give it up.:-( .But gs5.05 only support Type1 Font.We have to
generate
these fonts from TTF,that make gs5.05 package very big.
  Most CJKV fonts are Truetype Fonts.I think it is a easy way to use TTF
IMO.
Of course,we can generate Type1 fonts from TTF.
  According to some resources,GB2312-80 has 7,445 characters,Big5 have
13,494
characters,JISX 0208-1990 has 6,879 characters,KSC 5601-1992 has 8,224
characters.
These are all basic character Set.Their superset have more
characters.GB2312's
superset,GBK has 21,886 characters.Sounds like a nightmare?
  Two months ago,I make gs5.05 can show CJKV Font.But it need Type1 font
support.
So,even I send ps document to you,you still can't read it.
  I am sorry that I know nothing about Tex.Anyway,I am a newbie to
linux.

> The biggiest problem would be printing - the generation of PS prologue that
> will make fonts working in GS.
>
  Yes,we need GS support.

> Also, are there X font servers that can display CJKV Type1 fonts (most
> probably, yes)?
  
  Yes,there are many X font servers that can display CJKV fonts.
>
> Also, I have an impression that support for Vietnameese is already available
> in AW due to my non-latin1 singlebyte characters megapatch since all
> vietnameese characters can be represented as one byte.
>
> Also, as I remember, CJK "words" can be wrapped at any letter (so layout

  Sorry for my poor English,I can't understand what you said.What do you
mean here?

> logic should respect this). Am I right? What other languages from CJKV allow
> wrapping? I think core AW developers will make that change theirself if we ask
> nicely, so there is no need to study AW layout internals at all IMO - others
> will fix the logic for us.
>
> Also, is there any sense of "case" for CJKV "characters" - i.e. are there
> uppercased and lowercased versions of the same character? Probably some
> AW logic could depend on this distinction.

  As I know,there is no sense of case for CJKV characters.But there
maybe is
same character in different place of characterset.Not many,just few.

>
> So, could you please answer these questions.
> Thanks for your efforts.
>

  It is my pleasure.

Cheers!
> P.S.: You could simply call me Vlad, without surname.
>
> > Cheers!
> >
>
> Best regards,
> -Vlad



This archive was generated by hypermail 2b25 : Sun Oct 22 2000 - 04:57:59 CDT