Re: CJK patch test error report!


Subject: Re: CJK patch test error report!
From: Vlad Harchev (hvv@hippo.ru)
Date: Wed Nov 01 2000 - 00:24:45 CST


On Wed, 1 Nov 2000, Belcon wrote:

 Hi Belcon, Hi abi-dev

> Hello:
>
> Vlad Harchev дµÀ¡Ã
>
> [deleted]
> >
> > Nice hack :)
> > Also I would like you to try the latest CJK patch (it's all in one) that will
> > be announced shortly after this letter. It uses gdk_fontset_load instead of
> > gdk_font_load so your hack probably won't be needed. Could you test whether
> > it works?
> >
>
> There is still a bug when copy&paste.

 But does that patch fixes the problem of displaying Ch in GB* (without your
hack)?

> [deleted]
> >
> > Definitely, adding {"CP950","GB2312" } is a right thing. So this should be
> > definiely there. So you can copy Ch to clipboard :)
> > As for pasting, I've changed exporter to RTF a little too . Chances are low
> > that it help properly importing rtf exported by AW (i.e. copying and pasting)
> > but I see no other flaws in exporter/importer combination. So please try it
> > please.
> >
>
> I found that CP950 stands for Big5 encoding after I read some
> documents,while
> CP936 stands for GBK encoding. I don't know why
> wvLIDToCodePageConverter(0x404)
> return CP950 when I am using GB2312 encoding.:-(

 It's hardcoded into it wvLIDToCodePageConverter - that's why. :)
 I'll fix this - I will add another lookup table that will map zh_CN.GB2312 to
 language code 0x804 and all will be fine.

 I'll post this patch in 8 hours only, sorry. As for now, the absence of this
patch doesn't hurt at all. (Except for making rtfs wit CJK unreadable by old
Words).

> > If it doesn't work still - then save the rtf file and look at it with eyes.
> > Also try saving exactly same text in GB2312 Word or other app.
> > I recommend you to use some remarkable english phrases around say 2 chinese
> > characters (e.g. "AbiWord" )- this way you'll easily locate them with eyes.
> > Then compare the differences (and probably send me two pieces of text between
> > two english words - but try to limit it to 2 chinese characters so I will be
> > able to distinguish them too). Analyse. To test importing RTF (i.e. pasting)
> > just put a breakpoint at UT_Mbtowc::mbtowc and run importer or paste something
> > and watch whether things work as they should and fix them :)
> >
>
> Yes,I saved the rtf file and look at it with my eyes.When I tried to
> open this
> rtf file,AW only show "??" for one Chinese Character.I think it is
> because AW
> can't find the proper glyph.
> I find that in AW 0.7.10,src/wp/impexp/xp/ie_exp_RTF.cpp:
> UT_Bool IE_Exp_RTF::_write_rtf_header(void),there is something changed
> when AW
> upgrade to 0.7.11.AW 0.7.10 works fine with rtf while AW 0.7.11 not.Here
> is the
> difference that I think is the reason why AW 0.7.11 can't show Chinese
> characters
> when we open a chinese rtf file.
> AW 0.7.10:
> UT_Bool IE_Exp_RTF::_write_rtf_header(void)
> 378 {
> 379 UT_uint32 k,kLimit;
> 380
> 381 // write <rtf-header>
> 382 // return UT_FALSE on error
> 383
> 384 _rtf_open_brace();
> 385 _rtf_keyword("rtf",1); // major
> version number of spec version 1.5
> 386
> 387 _rtf_keyword("ansi");
> *** 388 _rtf_keyword("ansicpg",1252); // TODO what
> CodePage do we want here ??
> 389
> 390 _rtf_keyword("deff",0); //
> default font is index 0 aka black
> 391
> 392 // write the "font table"....
> [deleted]
>
> AW 0.7.11
> [deleted]
> 451 _rtf_keyword("ansi");
> 452 UT_Bool wrote_cpg = 0;
> 453 if (langcode)
> 454 {
> 455 char* cpgname =
> wvLIDToCodePageConverter(langcode);
> 456 if (UT_strnicmp(cpgname,"cp",2)==0 &&
> UT_UCS_isdigit(cpgname[2]))
> 457 {
> 458 int cpg;
> 459 if (sscanf(cpgname+2,"%d",&cpg)==1)
> 460 {
> 461 _rtf_keyword("ansicpg",cpg);
> 462 wrote_cpg = 1;
> 463 }
> 464 };
> 465 };
> 466 if (!wrote_cpg)
> 467 _rtf_keyword("ansicpg",1252); // TODO
> what CodePage do we want here ??
> 468
> 469 _rtf_keyword("deff",0); //
> default font is index 0 aka black
> [deleted]
> Here is the rtf file generate by AW 0.7.10 and 0.7.11.(There are two
> chinese characters
> surrended by "AbiWord",Chinese character are same in GB2312)
> AW 0.7.10
> {\rtf1\ansi\ansicpg1252\deff0
> {\fonttbl
> {\f0\fnil\fcharset0\fprq0\fttruetype Times New Roman;}
> {\f1\fnil\fcharset0\fprq0\fttruetype ar pl sungtil gb;}}
> {\colortbl
> \red0\green0\blue0;}
> \kerning0\cf0\viewkind1\paperw12240\paperh15840\margl1440\margr1440\widowctl
> \sectd\sbknone\colsx360
> \pard{\f0 AbiWord}{\f1\uc0\u20320\uc0\u22909 AbiWord}}
>
> AW 0.7.11
> {\rtf1\ansi\ansicpg950\deff0
> {\fonttbl
> {\f0\fnil\fcharset0\fprq0\fttruetype Times New Roman;}
> {\f1\fnil\fcharset0\fprq0\fttruetype ar pl sungtil gb;}}
> {\colortbl
> \red0\green0\blue0;}
> \kerning0\cf0\viewkind1\paperw12240\paperh15840\margl1440\margr1440\widowctl
> \sectd\sbknone\colsx360
> \pard{\f0 AbiWord}{\f1\'c4\'e3\'ba\'c3}{\f0 AbiWord}}
>
> If I open the rtf file generated by AW 0.7.10 in AW 0.7.11,works fine.
> So,IMHO,I think there is something wrong with our RTF part.This also
> generate problem when we use copy&paste function.

 Thank you for attaching them and for your analysis.

 The old rtf variant is "simple one" - it won't be understood by stupid
Wordpad and word6.0 and below. That's why I make producing RTFs in a wise way
(that don't work for CJK yet :).
 I've attached a small patch that will return back "stupid" generation of RTFs
(as 0.7.10 did). Please try it.
 
> > [deleted]
> > > >
> > > > Also try pasting from AW to AW. Does it work?
> > > >
> > > Many thanks to your help!
> > > Try pasting from AW to AW,it doesn't work if I haven't change CP936,
> > > after I changed CP936,it doesn't show Chinese characters.:-(
> >
> > As you've discovered, you have to add {"CP950","GB2312"} to that table.
>
> Still not work.
> Here is what I have found.Using copy&paste,take an example,I copy a
> Chinese
> character,whose UCS2 code is 0x4f60 while GB2312 code is 0xc4e3,in a abw
> file,and paste it in same file.Then I saved and quit.I use "vi" to look
> at the truth in abw file.The original chinese character is "&#x4f60;".
> The pasted chinese character should also be "&#x4f60",while actually
> it is "&#xc4;&#xe3;". This make me think that we forget to translate
> locale encoded characters to UCS2 encoded characters in copy and/or
> paste
> function.
> It is just my thought.And I found this suggestion did not match what I
> had
> said before.:-( I am not familiar about AW,so,maybe need you to take a
> look.

 Thank you for very clean analysis - that's what I've expected from you :)
 OK, please apply the patch to exporter first. And report results. That
problem with importer won't arise (that code simply won't be engaged).
 
> > >
>
> >
> > I didn't cc to Martin and Sam and HJ this time. If you guys want to be
> > "subscribed" make us know :)
> >
>
> I have been subscribed and I can receive the mails from mail-listing,but
> it seemed that I can't send my emails to this mail-listing.Maybe there
> is something in server which filter my emails. :-(

 Very strange :(

> > PS: I will announce next incremental next-cjk-patch.diff that will include
> > all the 2nd version of next-cjk-patch.diff had plus some changes made on
> > your research (fix in ev_UnixKeyboard.cpp, fix for exporter to .abw and
> > addition of {"CP950","GB2312"} to the table of encodings. So please try it.
> >
> > At a minimum, try changes to xap_UnixFont.cpp and ie_exp_RTF*.cpp - you
> > didn't try them yet.
> >
> > Best regards,
> > -Vlad
>

 I'm very busy today. I will be able to read my mail only 5 hours later,
sorry.
 So please try all variants.

 Best regards,
  -Vlad




This archive was generated by hypermail 2b25 : Wed Nov 01 2000 - 00:44:42 CST