Re: Patch: Fix for Bug 1164, 2nd try

Subject: Re: Patch: Fix for Bug 1164, 2nd try
From: Vlad Harchev (hvv@hippo.ru)
Date: Mon May 21 2001 - 13:28:07 CDT

sorted by: [ date ] [ thread ] [ subject ] [ author ]
Next message: Michael D. Pritchett: "Re: I'm back - sort of"
Previous message: Vlad Harchev: "Re: Patch: Fix for Bug 1164, 2nd try"
In reply to: Andrew Dunbar: "Re: Patch: Fix for Bug 1164, 2nd try"
Next in thread: Andrew Dunbar: "Re: Patch: Fix for Bug 1164, 2nd try"
Reply: Vlad Harchev: "Re: Patch: Fix for Bug 1164, 2nd try"

On Tue, 22 May 2001, Andrew Dunbar wrote:

> Vlad Harchev wrote:
> >
> > On Mon, 21 May 2001, Andrew Dunbar wrote:
> >
> > > Here's my second try.
> > > I've added more cpg's and fcharset's after doing some tests with
> > > MW Word and WordPad.
> > > I've made all the encoding names "CPxxx".
> > > Bugfix 836 is not broken any longer. Note that not even MS Word
> > > or Wordpad can load 836.rtf but we can (:
> > >
> > > I hope that's everything. CJK multibyte locales are not
> > > imported correctly yet.
> >
> > Why? CJK (chinese) people told that RTF was being imported and exported just
> > fine.
>
> I'm not sure. It may only be when the CJK text is governed by the
> font \fcharset tag or maybe something has broken. My hunch is that
> each byte of the multibyte characters is going through iconv separately
> but I'm not sure. We might want to look at what happens around
> line 834 at the call to m_mbtowc.mbtowc().

Did you do tests that shown that RTF exporter is broken for CJK? And the
importer?
I think we polished it and it was working fine. At least on unix (otherwise
CJK people would be unable to cut & paste! since AW uses internally when
copying/pasting from itself).

And no, each byte of multibyte is not going through iconv, our code is:
if (m_mbtowc.mbtowc(wc,(UT_Byte)ch))
return AddChar(wc);
it internally appends 'ch' to array-of-chars member of m_mbtowc, then calls
iconv and check whether it was able to convert aggregated sequence. If it was
able, then wchar is returned, otherwise 0 is returned (any already aggregated
sequence isn't lost between calls).

> > As for the patch - let me suggest few more case statements (for charset ->
> > iconv's charset name) switch (excerpt from OpenOffice sources - please adapt
> > them apropriately for AW):
> >
> > case 77: eTextEncoding = RTL_TEXTENCODING_APPLE_ROMAN;break;
> > case 130: eTextEncoding = RTL_TEXTENCODING_MS_1361; break;
> > case 255: eTextEncoding = RTL_TEXTENCODING_IBM_850; break;
>
> Excellent! Thanks. Can you see if they handle case 2 for the
> symbol charset?

Sorry - I don't have time for all this unfortunately.

Best regards,
-Vlad

Next message: Michael D. Pritchett: "Re: I'm back - sort of"
Previous message: Vlad Harchev: "Re: Patch: Fix for Bug 1164, 2nd try"
In reply to: Andrew Dunbar: "Re: Patch: Fix for Bug 1164, 2nd try"
Next in thread: Andrew Dunbar: "Re: Patch: Fix for Bug 1164, 2nd try"
Reply: Vlad Harchev: "Re: Patch: Fix for Bug 1164, 2nd try"

This archive was generated by hypermail 2b25 : Sat May 26 2001 - 03:51:05 CDT