libiconv Korean broken!


Subject: libiconv Korean broken!
From: Andrew Dunbar (hippietrail@yahoo.com)
Date: Thu May 24 2001 - 01:28:51 CDT


I've noticed Korean problems in several areas for a while now and
have finally got around to investigating. It turns out that libiconv
has completely broken code for KSC_5601!

Here's an exceprt from the unicode conversion data from their site:

# Format: Three tab-separated columns
# Column #1 is the Unified Hangeul code (in hex)
# Column #2 is the Unicode (in hex as 0xXXXX)
# Column #3 is the Unicode name (follows a comment sign, '#')
#
0x8141 0xAC02 # HANGUL SYLLABLE KIYEOK-A-SSANGKIYEOK
<much more snipped>

This means 0x81 0x41 is a correct multibyte sequence which should
be converted into the sixteen bit value 0xAC02.

Here's an exceprt from libiconv ksc5601.h:

static int
ksc5601_mbtowc (conv_t conv, wchar_t *pwc, const unsigned char *s, int
n)
{
  unsigned char c1 = s[0];
  if ((c1 >= 0x21 && c1 <= 0x2c) || (c1 >= 0x30 && c1 <= 0x48) || (c1 >=
0x4a && c1 <= 0x7d)) {
    <much code snipped>
  }
  return RET_ILSEQ;
}

You'll see that our very first byte, 0x81, does not pass the very first
test!

Something is very wrong here. To check for yourself try loading any
Korean plain text file when using a Korean locale and compare with
another
program which also handles Korean encoded files. Saving is also broken
as is input and anything that treats Korean as multibyte.

Using Korean as pure Unicode is unaffected of course.

Andrew Dunbar.

-- 
http://linguaphile.sourceforge.net

_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com




This archive was generated by hypermail 2b25 : Sat May 26 2001 - 03:51:07 CDT