Re: Patch: Fix for Bug 1164, 2nd try


Subject: Re: Patch: Fix for Bug 1164, 2nd try
From: Vlad Harchev (hvv@hippo.ru)
Date: Mon May 21 2001 - 13:28:07 CDT


On Tue, 22 May 2001, Andrew Dunbar wrote:

> Vlad Harchev wrote:
> >
> > On Mon, 21 May 2001, Andrew Dunbar wrote:
> >
> > > Here's my second try.
> > > I've added more cpg's and fcharset's after doing some tests with
> > > MW Word and WordPad.
> > > I've made all the encoding names "CPxxx".
> > > Bugfix 836 is not broken any longer. Note that not even MS Word
> > > or Wordpad can load 836.rtf but we can (:
> > >
> > > I hope that's everything. CJK multibyte locales are not
> > > imported correctly yet.
> >
> > Why? CJK (chinese) people told that RTF was being imported and exported just
> > fine.
>
> I'm not sure. It may only be when the CJK text is governed by the
> font \fcharset tag or maybe something has broken. My hunch is that
> each byte of the multibyte characters is going through iconv separately
> but I'm not sure. We might want to look at what happens around
> line 834 at the call to m_mbtowc.mbtowc().

 Did you do tests that shown that RTF exporter is broken for CJK? And the
importer?
 I think we polished it and it was working fine. At least on unix (otherwise
CJK people would be unable to cut & paste! since AW uses internally when
copying/pasting from itself).

 And no, each byte of multibyte is not going through iconv, our code is:
                                      if (m_mbtowc.mbtowc(wc,(UT_Byte)ch))
                                              return AddChar(wc);
 it internally appends 'ch' to array-of-chars member of m_mbtowc, then calls
iconv and check whether it was able to convert aggregated sequence. If it was
able, then wchar is returned, otherwise 0 is returned (any already aggregated
sequence isn't lost between calls).
 
> > As for the patch - let me suggest few more case statements (for charset ->
> > iconv's charset name) switch (excerpt from OpenOffice sources - please adapt
> > them apropriately for AW):
> >
> > case 77: eTextEncoding = RTL_TEXTENCODING_APPLE_ROMAN;break;
> > case 130: eTextEncoding = RTL_TEXTENCODING_MS_1361; break;
> > case 255: eTextEncoding = RTL_TEXTENCODING_IBM_850; break;
>
> Excellent! Thanks. Can you see if they handle case 2 for the
> symbol charset?

 Sorry - I don't have time for all this unfortunately.

 Best regards,
  -Vlad



This archive was generated by hypermail 2b25 : Sat May 26 2001 - 03:51:05 CDT