Re: request for help from CJK hackers


Subject: Re: request for help from CJK hackers
From: Vlad Harchev (hvv@hippo.ru)
Date: Thu Nov 09 2000 - 04:42:57 CST


On Thu, 9 Nov 2000, Chih-Wei Huang wrote:

 Hello,

> Vlad Harchev ¼g¹D¡G
> >
> > > Here is my quick fix for GB2312&Big5:
> > >
> > > if (strcmp((char*)pKeyword, "ansicpg") == 0)
> > > {
> > > if(param==950)
> > >
> > > m_mbtowc.setInCharset(XAP_EncodingManager::instance->charsetFromCodepage((UT_uint32)0x404));
> > > else if(param==936)
> > >
> > > m_mbtowc.setInCharset(XAP_EncodingManager::instance->charsetFromCodepage((UT_uint32)0x804));
> > > else
> > >
> > > m_mbtowc.setInCharset(XAP_EncodingManager::instance->charsetFromCodepage((UT_uint32)param));
> > > }
> > > Of course,here still needs work.I am not familar with your class,Vlad.
> > > I can't convert codepage(CP936 or CP950) to charactset.
> >
> > Yes, your logic is correct, but it should be moved to
> > XAP_EncodingManager::charsetFromCodepage()
> > - I will do it cleanly today. As for now - use your solution for testing (and
> > share with other CJK hackers).
>
> After applying this fix, I can now import Chinese RTF
> exported before by AW. However, there are still some serious
> problems:
>
> 1. Some Chinese characters are eaten or mis-interpreted.
> After a quick analysis, I found for the highest bit of second byte
> of big5 character being 0, the character is exported incorrect.
> For example, the character '¤J' 0xa44b
> is saved as \'a4J. Even if I hack the it into \'a4\'4b in the RTF,
> the imports still incorrect.

 Hmm - that seems to be a weird problem. Could you please apply the fix to RTF
importer posted couple of minutes ago? I can't think of a reason why it
works this incorrect way for you.

> 2. The exported RTF cannot be read by MSWord 2000.
> All Chinese character didn't display.

 Try the following:
Substitute all \fcharset0 with \fcharset134 (if your text is in GB2312) (or
change argument to \fcharset to the one Word200 uses).
  Also, please save a small rtf file exported by AW and rtf file saved by
word2k and send them to me.

> 3. I create an RTF by MSWord 2000 and read by AW,
> AW crashed immediately:
>
> ** ERROR **: file ie_imp_RTF.cpp: line 492 (UT_Bool
> IE_Imp_RTF::PopRTFState()): assertion failed: (pState != NULL)
> aborting...
> Aborted

 It seems AW encountered extra "}" in RTF (not that RTF is bad, but AW didn't
count { and } properly). Please explore where this happens. I guess it happens
when parsing \fonttbl - i.e. in the very begining.
 
> 4. Copy & paste still didn't work...
 
 Should be fixed if fix posted couple minutes ago is applied.

 Just curious - how AW imports Word2k's .doc files?

>
> --
> ~ Chih-Wei Huang (cwhuang)
> 'v' E-Mail : cwhuang@linux.org.tw
> // \\ CLDP Project : http://www.linux.org.tw/CLDP/ (Coordinator)
> /( )\ CLE Project : http://cle.linux.org.tw/CLE/ (Developer)
> ^`~'^ HomePage : http://www.cwhuang.idv.tw/
>

 Best regards,
  -Vlad



This archive was generated by hypermail 2b25 : Thu Nov 09 2000 - 05:03:46 CST