Re: request for help from CJK hackers


Subject: Re: request for help from CJK hackers
From: Vlad Harchev (hvv@hippo.ru)
Date: Thu Nov 09 2000 - 00:12:12 CST


On Thu, 9 Nov 2000, Belcon Zhao wrote:

 Hello Belcon,

>
> Hello Vlad
>
> >From: Vlad Harchev <hvv@hippo.ru>
> >To: abiword-dev@abisource.com
> >CC: Belcon Zhao <belcon@hotmail.com>, Belcon <rainfall@yeah.net>,
> >hashao <hashao@china.com>, Chih-Wei Huang <cwhuang@linux.org.tw>, hj
> ><huangj@citiz.net>
> >Subject: request for help from CJK hackers
> >Date: Wed, 8 Nov 2000 23:13:46 +0400 (SAMT)
> >
> > Hi guys,
> >
> > It seems AbiWord-0.7.12 will be released in the begining of next week, so
> >it
> >would be nice if all CJK issues were worked out.
>[...]
> Yeah.I got it.Here is the reason why m_mbtowc always return 1,Vlad.
> After I set debug message and found that here param is just 936 for
> GB2312.Here we read "\ansicpg936"(for GB2312) from rtf file,and we
> seperate 936 from the string and set param=936.But we don't expect
> this result,IMHO.Vlad,I guess you want to get param=0x804(for GB2312)
> or 0x404(for Big5),then ***XAP_EncodingManager::instance->
> charsetFromCodepage((UT_uint32)param))*** return GB2312 or Big5.But if
> param=936,it will ***always*** return CP1252.So,our character is
> set in a wrong way.
> Vlad,I am just curious that how your Russian Characters work fine.:-)
> As I know,it should return CP1251 for Russian.

 Thanks for this analysis. BTW it returns CP1251 for russian - for
\ansicpg1251 (just cats "CP" and "1251" together - no problem since glibc
knows cp1251 under that name).

> Here is my quick fix for GB2312&Big5:
>
> if (strcmp((char*)pKeyword, "ansicpg") == 0)
> {
> if(param==950)
>
> m_mbtowc.setInCharset(XAP_EncodingManager::instance->charsetFromCodepage((UT_uint32)0x404));
> else if(param==936)
>
> m_mbtowc.setInCharset(XAP_EncodingManager::instance->charsetFromCodepage((UT_uint32)0x804));
> else
>
> m_mbtowc.setInCharset(XAP_EncodingManager::instance->charsetFromCodepage((UT_uint32)param));
> }
> Of course,here still needs work.I am not familar with your class,Vlad.
> I can't convert codepage(CP936 or CP950) to charactset.

 Yes, your logic is correct, but it should be moved to
XAP_EncodingManager::charsetFromCodepage()
- I will do it cleanly today. As for now - use your solution for testing (and
share with other CJK hackers).

> But here still has a question that I report to you yesterday,that is
> the sequence of English Characters and Chinese Characters.I am debugging
> now.If I have result,I will report to you.

 Here is an explicit description of the problem:

 when sequence "AbiWord**" (where ** are Chinese chars) is cut and then
pasted, it's pasted as "**AbiWord" - i.e. Chinese chars are moved in front of
english. I don't know whether chinese chars gets reversed.

 My questions:
* Does CJK chars gets reversed (i.e. was ABCD and becomes DCBA ?) or they are
just positioned after English chars in original order?
* Save file as .rtf - do CJK chars get moved in it?
* Manually save the same file as .txt and as .abw - is there the same problem
 with CJK chars?
* Also save as html, latex - is there the same problem with CJK chars

 If CJK chars are moved if saved in any file formats, then it's a problem with
AW core. I don't have any ideas what can be causing it.
 If not - then I hope someone has a clue on the reasons for this funny
behaviour.

>
>
> > So you should ensure that XAP_EncodingManager::instance->
> >charsetFromCodepage((UT_uint32)param) returns name of charset libc knows.
> >If
> >it returns charset name unknow to glibc, just tell me for what parameter it
> >should return what (and what it actually returns) and I will correct it
> >properly (or do it yourself - in /src/af/xap/xp/xap_EncodingManager.cpp) -
> >but write a quick hack in order not to wait for correct fix (from you or
> >me),
> >and test it. Test cut and paste after this.
> >
> >Also, please test (and fix :) the following:
> >* cutting from AW and pasting to other apps
> >* pasting to AW from other apps

 Do these things work fine?

[...]
> BTW: CJK's font is not Type1 font.It is Truetype Font.So AW of CJK version
> depends on what Truetype fonts you have installed in your
> system.
> Should we add a function to install Truetype Font in AW automatically?Just
> my opinion.

 Hmm, I don't understand you - what do you mean by "install automatically"?
 I assume you propose to ship subdirectories with CJK fonts.dir with AW -
right? If everything that is required for Chinese (and probably
all CJK fonts) is subdirectory with fonts.dir in it, these directories could
be added to AW distribution. So I agree with this.
 Then prepare fonts.dir for GB2312 and Big5 you think should be in AW
distribution and post a patch that will add them (or better - discuss
directory name with me and then post a patch).
 I will post a proposal for including CJK fonts.dir with AW later today to the
list explicitly.

 Thanks!
 
> Best regards!
> -Belcon
> _________________________________________________________________________
> Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com.
>
> Share information about yourself, create your own public profile at
> http://profiles.msn.com.
>

 Best regards,
  -Vlad



This archive was generated by hypermail 2b25 : Thu Nov 09 2000 - 00:31:15 CST