Re: Fix UT_UTF8String::appendBuf for multibyte charsets.

From: Roland Kay <roland.kay_at_ox.compsoc.net>
Date: Fri Jul 15 2005 - 07:20:51 CEST

Thanks Martin.

The addition of the if(...) condition so that the function ignores the
return value of mbtowc() if mbtowc() returns false is based on the way
this is handled in the RTF importer's ParseChar() function. However, this
method worries me a little. If we pass invalid double-byte encoded data
which contains an odd number of bytes then the last byte will sit in
iconv's buffer and mess up anything else that gets converted.

I've seen this behaviour before in the RTF importer. The importer
mistakenly treated a group containing ASCII text as GB2312. Of course,
that group was turned into gibberish. However, because it contained an odd
number of bytes the rest of the document was also turned into gibberish
even though the importer got the encoding right for that part.

I haven't had a chance to look into this, but it would be nice if it could
be made more robust.

Best wishes,

R.
Received on Fri Jul 15 07:26:22 2005

This archive was generated by hypermail 2.1.8 : Fri Jul 15 2005 - 07:26:22 CEST