Re: Patch: Fix for Bug 1164, 2nd try


Subject: Re: Patch: Fix for Bug 1164, 2nd try
From: Hubert Figuiere (hfiguiere@teaser.fr)
Date: Mon May 21 2001 - 12:37:53 CDT


Andrew Dunbar wrote:

> Well there are various ways to encode. AbiWord's RTF exporter work fine
> for CJK but it exports unicode:
>
> \f0 \uc0\u12518 \uc0\u12491 \uc0\u12467 \uc0\u12540 \uc0\u12489
> \uc0\u12392 \uc0\u12399 \uc0\u20309 \uc0\u12363 \uc0\u-225
>
> If I create an RTF with CJK characters in Word it looks like this:
>
> \f5\'82\'c9\'82\'d9\'82\'f1\'82\'b2
>
> This is what is not being imported correctly.

What you describe is hex representation of chars, as used for non ASCII
chars. The problem is that the current importer parse char one by one
instead of run by run, making multi-byte decoding (as described above)
impossible.

Perhaps should someone (I ?) modifu the importer to spool chars in a
buffer before decoding.

Please file a bug on this at least in bugzilla fi you are not
voluntering. Since I have to rewritte the RTF parser (or deeply modify
it), I may put this on my TODO list. But if it is not in bugzilla, I'll
forget it. If you want to implement this, go ahead. I'll cope with it.

Hub



This archive was generated by hypermail 2b25 : Sat May 26 2001 - 03:51:05 CDT