Re: Patch: Fix for Bug 1164, 2nd try


Subject: Re: Patch: Fix for Bug 1164, 2nd try
From: Andrew Dunbar (hippietrail@yahoo.com)
Date: Mon May 21 2001 - 12:49:17 CDT


Hubert Figuiere wrote:
>
> Andrew Dunbar wrote:
>
> > Well there are various ways to encode. AbiWord's RTF exporter work fine
> > for CJK but it exports unicode:
> >
> > \f0 \uc0\u12518 \uc0\u12491 \uc0\u12467 \uc0\u12540 \uc0\u12489
> > \uc0\u12392 \uc0\u12399 \uc0\u20309 \uc0\u12363 \uc0\u-225
> >
> > If I create an RTF with CJK characters in Word it looks like this:
> >
> > \f5\'82\'c9\'82\'d9\'82\'f1\'82\'b2
> >
> > This is what is not being imported correctly.
>
> What you describe is hex representation of chars, as used for non ASCII
> chars. The problem is that the current importer parse char one by one
> instead of run by run, making multi-byte decoding (as described above)
> impossible.
>
> Perhaps should someone (I ?) modifu the importer to spool chars in a
> buffer before decoding.

Oh that makes perfect sense. From what I read, Japanized versions of
Word can also direct 8-bit encoded data so this may also be what
users have reported as working in the past.

It looks like it might be beyond me to fix this though I have been
enjoying it so far (: I'm still not convinced this is the nature
of the problem though since to me it looks like mbtowc() is only
seeing only *decoded* byte at a time...

> Please file a bug on this at least in bugzilla fi you are not
> voluntering. Since I have to rewritte the RTF parser (or deeply modify
> it), I may put this on my TODO list. But if it is not in bugzilla, I'll
> forget it. If you want to implement this, go ahead. I'll cope with it.

I'll file it right away.

Andrew.

-- 
http://linguaphile.sourceforge.net

_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com




This archive was generated by hypermail 2b25 : Sat May 26 2001 - 03:51:05 CDT