Re: Patch: Fix for Bug 1164, 2nd try


Subject: Re: Patch: Fix for Bug 1164, 2nd try
From: Andrew Dunbar (hippietrail@yahoo.com)
Date: Mon May 21 2001 - 12:26:27 CDT


Vlad Harchev wrote:
>
> On Tue, 22 May 2001, Andrew Dunbar wrote:
>
> > Vlad Harchev wrote:
> > >
> > > On Mon, 21 May 2001, Andrew Dunbar wrote:
> > >
> > > > Here's my second try.
> > > > I've added more cpg's and fcharset's after doing some tests with
> > > > MW Word and WordPad.
> > > > I've made all the encoding names "CPxxx".
> > > > Bugfix 836 is not broken any longer. Note that not even MS Word
> > > > or Wordpad can load 836.rtf but we can (:
> > > >
> > > > I hope that's everything. CJK multibyte locales are not
> > > > imported correctly yet.
> > >
> > > Why? CJK (chinese) people told that RTF was being imported and exported just
> > > fine.
> >
> > I'm not sure. It may only be when the CJK text is governed by the
> > font \fcharset tag or maybe something has broken. My hunch is that
> > each byte of the multibyte characters is going through iconv separately
> > but I'm not sure. We might want to look at what happens around
> > line 834 at the call to m_mbtowc.mbtowc().
>
> Did you do tests that shown that RTF exporter is broken for CJK? And the
> importer?
> I think we polished it and it was working fine. At least on unix (otherwise
> CJK people would be unable to cut & paste! since AW uses internally when
> copying/pasting from itself).

Well there are various ways to encode. AbiWord's RTF exporter work fine
for CJK but it exports unicode:

\f0 \uc0\u12518 \uc0\u12491 \uc0\u12467 \uc0\u12540 \uc0\u12489
\uc0\u12392 \uc0\u12399 \uc0\u20309 \uc0\u12363 \uc0\u-225

If I create an RTF with CJK characters in Word it looks like this:

\f5\'82\'c9\'82\'d9\'82\'f1\'82\'b2

This is what is not being imported correctly.

> And no, each byte of multibyte is not going through iconv, our code is:
> if (m_mbtowc.mbtowc(wc,(UT_Byte)ch))
> return AddChar(wc);
> it internally appends 'ch' to array-of-chars member of m_mbtowc, then calls
> iconv and check whether it was able to convert aggregated sequence. If it was
> able, then wchar is returned, otherwise 0 is returned (any already aggregated
> sequence isn't lost between calls).

I'll try to look into it soon.

Andrew.

-- 
http://linguaphile.sourceforge.net

_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com




This archive was generated by hypermail 2b25 : Sat May 26 2001 - 03:51:05 CDT