Re: libiconv Korean broken!


Subject: Re: libiconv Korean broken!
From: Vlad Harchev (hvv@hippo.ru)
Date: Thu May 24 2001 - 09:26:50 CDT


On Thu, 24 May 2001, Andrew Dunbar wrote:

 Hi Andrew,

> I've noticed Korean problems in several areas for a while now and
> have finally got around to investigating. It turns out that libiconv
> has completely broken code for KSC_5601!
>
> Here's an exceprt from the unicode conversion data from their site:
>
> # Format: Three tab-separated columns
> # Column #1 is the Unified Hangeul code (in hex)
> # Column #2 is the Unicode (in hex as 0xXXXX)
> # Column #3 is the Unicode name (follows a comment sign, '#')
> #
> 0x8141 0xAC02 # HANGUL SYLLABLE KIYEOK-A-SSANGKIYEOK
> <much more snipped>
>
> This means 0x81 0x41 is a correct multibyte sequence which should
> be converted into the sixteen bit value 0xAC02.

> Here's an exceprt from libiconv ksc5601.h:
>
> static int
> ksc5601_mbtowc (conv_t conv, wchar_t *pwc, const unsigned char *s, int
> n)
> {
> unsigned char c1 = s[0];
> if ((c1 >= 0x21 && c1 <= 0x2c) || (c1 >= 0x30 && c1 <= 0x48) || (c1 >=
> 0x4a && c1 <= 0x7d)) {
> <much code snipped>
> }
> return RET_ILSEQ;
> }

 Yes, it seems to be broken. May be 's' should point right to the
byte after 0x81, i.e. to 0x41, when this function is called?
 
> You'll see that our very first byte, 0x81, does not pass the very first
> test!
>
> Something is very wrong here. To check for yourself try loading any
> Korean plain text file when using a Korean locale and compare with
> another
> program which also handles Korean encoded files. Saving is also broken
> as is input and anything that treats Korean as multibyte.

  Of course it would be nice to report this to libiconv's maintainer.
 
 Best regards,
  -Vlad



This archive was generated by hypermail 2b25 : Sat May 26 2001 - 03:51:07 CDT