iconv and its suitability for wv

Caolan McNamara (Caolan.McNamara@ul.ie)
Wed, 10 Nov 1999 12:35:26 -0000 (GMT)

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Darren O. Benham: "new libpng"
Previous message: Larry Kollar: "Re: design -- multiple languages in the same document"

There is a reasonably new call to convert from one charser to
another called iconv. Its available on at least the new glibc
for linux and the standard libc in solaris and aix. The glibc2
one is very good and supports everything that wv would need.

wv needs to be able to convert the windows 8 bit charsets which
are used particularly by word 95 and 6 document into something
which is widely used on unix machines. I want to default to
converting to utf-8 unicode myself, but sometimes other people
cannot use utf-8.

Some users want to be able to convert the output of wv
into a different encoding. For instance koi-8 for russian. In
a perfect world we could use iconv to convert the windows
codepages on the way into wv which uses unicode internally, and
then use iconv to convert the unicode text on the way out if
the user wants to override the default bevaviour of using unicode
for the output as well.

The problem is that a quick check of the solaris iconv shows
that it cannot convert the windows codepages at all. The gnu
glibc2 is perfect and does everything we would need, but the
other implementations fall a bit flat, they work but they
dont have the mapping tables required to do the conversion.

Solutions are a bit thin on the ground, The best that I can
think of is that wv should do a configure test and attempt to
find out if

a) iconv exists and can do all the conversions. In which case
we use that
b) iconv exists but cannot do all the conversions. In which
case we use iconv but install our own conversion tables and
make iconv check through our ones looking for conversion
routes.
c) iconv does not exist, in which case we have to all the work
ourselves, in which case we will only be able to import into
unicode from the windows codepages, and we will only be able
to export into the four or five major charsets, i.e. utf-8
iso-5589-15 and koi plus one or two others.

Does this make sense ? It would be nice to know how widespread
the iconv interfae is. At
http://www.csn.ul.ie/~caolan/tmp/iconvexample.c
there is an example which should compile correctly if you have
iconv, and will run correctly if you can convert from the
windows cp1251 (russian) codepage to unicode, which I bet a load
of implementation do not have.

I can test for the existance of the iconv header easily, but the
linker flags will be different on different platforms, -liconv on
some platforms, and maybe not on others etc etc.

Once we have identified that a platform *has* iconv we would have
to put together a mechanism of determing that each of the windows
codepages can be converted to unicode. Which could take a lot of
time on configure. The other issue then is what is the start of
play on the windows platform ? Is there an equivalent function
call ? And do other similar features exist elsewhere. All in all
its a bit complex. For now I will implement option c to convert
the windows codepages to unicode manually myself, which will add
about 14 c files to the build and make it significantly larger for
what are basically uncommon cases, I think in the real world there
is no way to avoid this.

Real Life: Caolan McNamara * Doing: MSc in HCI
Work: Caolan.McNamara@ul.ie * Phone: +353-86-8790257
URL: http://www.csn.ul.ie/~caolan * Sig: an oblique strategy
Accretion

Next message: Darren O. Benham: "new libpng"
Previous message: Larry Kollar: "Re: design -- multiple languages in the same document"

This archive was generated by hypermail 1.03b2.